Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Electron apps syncing simple db between each other #498

Closed
ZeeCoder opened this issue Nov 10, 2018 · 19 comments
Closed

Electron apps syncing simple db between each other #498

ZeeCoder opened this issue Nov 10, 2018 · 19 comments

Comments

@ZeeCoder
Copy link

I'm a bit confused what I need to do here to have several electron apps communicate with each other, and syncing a simple key-value db.
Essentially any app could add or modify the contents of the DB over a local network.

I've created a local DB in one of the apps using orbit-db and ipfs, but I'm not sure how to connect two.
In the getting started guide in the replication section (https://github.com/orbitdb/orbit-db/blob/master/GUIDE.md#replicating-a-database) there seem to be an assumtion, that there's a "master" DB, which is replicated by the second DB, which is not what I want.

Essentially these apps would need to automatically discover each other, and just sync automatically, without having a "master" that all others replicate.

Another thing that confused me is this the "ipfs daemon" (https://github.com/orbitdb/orbit-db#module-with-ipfs-daemon) which seems to be recommended for electron apps, but I'm not sure what the difference is between this and my first approach which just using the ipfs module.

I'm sure I just need to do my research, I just feel like I don't know where to look. 😅

@aphelionz
Copy link
Member

Hey, @ZeeCoder! I believe I understand that you want each app to contain one or more orbitdbs, and simply make those available to other electron apps on the same system. Is this correct?

To start our troubleshooting, let's ensure that each of the nodes are connected to one another. OrbitDB requires this. You can do this by trying the following steps:

Let's assume your IPFS variable is simply ipfs.

  1. running something like ipfs.id().then(console.log) and noting the multiaddr of the node. something like /ip4/127.0.0.1/tcp/[####]/Qm....
  2. Copying that value into the "Bootstrap" field of the IPFS config object in your electron app's JS
  3. Then, you should be able tp run ipfs.swarm.peers() inside each app and see an array of peers, the ids of which should match your app's id.
  4. Copy this process for each app, in each app until all apps are "bootstrapped" with all the others.
  5. Run your tests again.

Now, I'm not too sure about how electron networking works, but assuming that they all just communicate with the host network, then you'll just have a bunch of js-ipfs daemons running on a bunch of different ports.

@aphelionz
Copy link
Member

As to your question about the ipfs daemon - if you're using js-ipfs inside of a node app, the default behavior is that the app "is its own" daemon while it's running.

@ZeeCoder
Copy link
Author

ZeeCoder commented Nov 10, 2018

Thanks for the fast answer @aphelionz !

I would want the apps to essentially operate on the same DB, and sync each others' changes.
So if app1 adds the following: { username: "some user" };
Then app2 would receive that automatically, and should be able to amend it, which app1 would receive.

On syncing
So I don't need the db1.address then at all?
Is this how it would look like essentially?

const ipfsOptions = {
    EXPERIMENTAL: {
        pubsub: true,
    }
};

// Create IPFS instance
const ipfs = new IPFS(ipfsOptions);

// Assuming that I got `app2Address` from app2 somehow, which I got by running `ipfs.id()`
ipfs.bootstrap.add(app2Address)

ipfs.on('ready', async () => {
    // Create OrbitDB instance
    const orbitdb = new OrbitDB(ipfs);
    const db = await orbitdb.docs("db", { write: ['*'] });
    await db.load();
   
    // Now whatever I do with db, it's replicated in app2's db (?)
});

Then app2's code would look essentially the same?
Can I add nodes like this as I'm discovering them on the network?
So if I start app3 and app4, do I just add their addresses the same way runtime to ipfs, and will those apps start syncing?

About the daemon:
Technically electron is a node environment, and it seemed to work just as is 🤔

@aphelionz
Copy link
Member

That's the idea. The kinda unavoidable thing is that somebody has to start the "genesis" database. From there you have two options to get that data to the other nodes: database duplication or database replication.

  1. Replication, as you pointed out only sends changes from that db to the other nodes. As you said,
  2. Duplication is what you want (each node has the full db with it) the same model as git or a distributed ledger. This is accomplished in orbit by calling db.load() periodically in the app to make sure it has the latest and greatest.

@ZeeCoder
Copy link
Author

ooh, so it doesn't sync automatically? :(
So what does this mean then, isn't this possible?

OrbitDB uses IPFS as its data storage and IPFS Pubsub to automatically sync databases with peers

Anyway, can I run my idea so far through you, just to see if I really get this:

I'd make the apps start their local DB first, with something like the following:

const ipfsOptions = {
    EXPERIMENTAL: {
        pubsub: true,
    }
};

// Create IPFS instance
const ipfs = new IPFS(ipfsOptions);

// I'm intentionally not adding remote ipfs ids here to bootstrap, as I'll discover them later 

ipfs.on('ready', async () => {
    // Create OrbitDB instance
    const orbitdb = new OrbitDB(ipfs);
    const db = await orbitdb.docs("db", { write: ['*'] });
    
    // This is basically polling at this point from all other nodes, which
    // I kinda wanted to avoid as I thought orbit-db would do this
    // automatically when new data is created in a node. 🤷‍♂️
    const tick = async () => {
        await db.load();
        await delay(1000);
        tick()
    }

    tick()
});

// Here I'd expose this app's ID using `ipfs.id()`, with either websockets or a
// simple express server over a fixed port, so that I can discover apps on a local network

// ...imagine express server code here exposing ipfs.id()...

// Here I'd imagine yet another loop, that discovers other apps over the
// aformentioned port, and gets their ipfs ID:

const discover = async () => {
  const ipfsIds = findIpfsIdsOverTheLocalNetwork();
  ipfsIds.map(id => ipfs.bootstrap.add(id));
  await delay(1000);
  discover()
}

discover()

All the above are kinda pseudo-code ish, but you get the idea.

What I still don't get though, is why I need a genesis db here? Technically all apps have their own local DB, and the whole Idea of being P2P is not to have a central app.
Otherwise I'd just make one app the master with an exposed REST api or something. 🤔

@aphelionz
Copy link
Member

Somebody, somewhere, has to write the first entry into the CRDT and "start" the database.

I'm curious to see how far you got based on our conversation. Maybe try out the psuedo-code and then report back? You can also be artful about how you call load, it doesn't need to be polling.

@aphelionz
Copy link
Member

Also, a clarification: pubsub handles the data transfer on the IPFS later and load() brings those contents into memory to be queried.

@ZeeCoder
Copy link
Author

ZeeCoder commented Nov 11, 2018

I don't want to start working on this until I know the project fits my needs. 😅
For example having to have a "master" app sounds the exact opposite of a system that claims to be a

distributed, peer-to-peer database.

Automatic syncing was the other thing that made me think this project might be a good fit, but I'm not so sure anymore.
I'll probably look around a bit more to see if there's something that's actually distributed peer-to-peer, and automatically syncing between nodes, as I don't want to deal with these "low-level" details if possible. 😅

@aphelionz
Copy link
Member

There's a distinction between "master" / "central" and the term I used, which is "genesis."

If I start a git repository and you eventually clone it, we both have a copy of the database locally, can perform operations on it, and have certain assurances that when we decide to sync up (which can be done at any time, at any interval), functions run on our data will yield the same outputs. The fact that I started it is just a consequence of the fact that somebody had to make the first commit.

In the case of orbitdb, this DOES happen automatically via pubsub as you suggest, on the IPFS layer. It seems like you mostly don't like the fact you have to periodically call db.load to get the database into memory to work with, which is a valid critique and we can perhaps look at keeping a certain amount of data in memory at all times.

@aphelionz
Copy link
Member

There's an electron implementation of orbit chat here, for reference: https://github.com/orbitdb/orbit-electron

@ZeeCoder
Copy link
Author

The way I imagined is less like a git repo (which has to have an initial commit, as all other commits then reference another commit before them) and more like array of objects getting merged together.
If you think of the databases as array of object, then each peer would just add new objects to their local array as they come in from other peers.
In this scenario there's no need for a "genesis" array.

That's sort of how I thought this would work.

The part I don't really get is not just why we need a "genesis" DB, but how it gets created?

Like here: #498 (comment)

If I start 3 apps, which one creates such a "genesis" table?

What happens if after I started them, one app is not connected to the network, and while the first two starts syncing with each other (let's say app1 created a "genesis" db somehow) would app 3 create a genesis db of its own, as it might assume there are no other nodes?

What happens then when it does connect to the network? Would it somehow conflict with app1 and 2, instead of the three apps nicely fetching new data from each other to get synced?

@ZeeCoder
Copy link
Author

I have so many questions and I'm not sure where to look for answers tbh. 😅

@aphelionz
Copy link
Member

We'll get there! Let me type up detailed responses tomorrow and get back to you.

@ZeeCoder
Copy link
Author

Sorry for being so pushy about this @aphelionz btw, I really do appreciate your help! 🙌

@aphelionz
Copy link
Member

aphelionz commented Nov 13, 2018

Let's start with just a basic walkthrough of what each line is and does in the following example:

// start with dependencies
const IPFS = require("ipfs");
const OrbitDB = require("orbit-db")

const ipfsOptions = {
  repo: "./orbitdb/ipfs",
  EXPERIMENTAL: {
    pubsub: true,
  }
}

// create a js-ipfs repo with default config
var ipfs = new IPFS();

// Once IPFS is ready, create a new orbitdb instance by passing ipfs
ipfs.on("ready", async() => {
  const orbitdb = new OrbitDB(ipfs);
}

At this point, in your local file system, you'll have some folders - the ipfs repo and an orbitdb-managed folder with the same name as your IPFS node's ID.

$ ls ./orbitdb
ipfs                                           # IPFS repo
Qmacc8APadGneFifUsAfh9mNuARHQTgGHKb41eHP1SxmGD # OrbitDB metadata (keystore, etc)

Back in our javascript we can make a database:

ipfs.on("ready", async() => {
  const orbitdb = new OrbitDB(ipfs);
  const log = await orbitdb.log("/testdb") 
}

and back on the filesystem:

$ ls orbitdb/
ipfs
Qmacc8APadGneFifUsAfh9mNuARHQTgGHKb41eHP1SxmGD  QmeFHiAqnaKe4HgJ4WJcRDJ1gSyPqCwyrYvv8f6d3HHp9u

# So what's in there? A folder with the name of your db
$ ls orbitdb/QmeFHiAqnaKe4HgJ4WJcRDJ1gSyPqCwyrYvv8f6d3HHp9u/
testdb

$ ls -alh orbitdb/QmeFHiAqnaKe4HgJ4WJcRDJ1gSyPqCwyrYvv8f6d3HHp9u/testdb/
total 24K
drwxrwxr-x 2 mark mark 4.0K Nov 12 22:15 .
drwxrwxr-x 3 mark mark 4.0K Nov 12 22:15 ..
-rw-rw-r-- 1 mark mark  284 Nov 12 22:15 000003.log
-rw-rw-r-- 1 mark mark   16 Nov 12 22:15 CURRENT
-rw-r--r-- 1 mark mark    0 Nov 12 22:15 LOCK
-rw-rw-r-- 1 mark mark   57 Nov 12 22:15 LOG
-rw-rw-r-- 1 mark mark   50 Nov 12 22:15 MANIFEST-000002

I can break down what each of these files are, but the gist of what I'm getting at is this: Each node / app / peer is going to have a database structure like this - they can connect via the orbit address generated from database creation:

I guess all of this is tantamount to git init, to stay with the analogy. This answers one of your questions:

If I start 3 apps, which one creates such a "genesis" table?

They all do, and to address:

If you think of the databases as array of object, then each peer would just add new objects to their local array as they come in from other peers.

The databases ARE arrays of objects. Technically they're arrays of ipfs-log objects. Each one DOES independently act on its own copy of the database. https://github.com/orbitdb/ipfs-log

But What About Merging?

You asked a few questions here:

What happens if after I started them, one app is not connected to the network, and while the first two starts syncing with each other (let's say app1 created a "genesis" db somehow) would app 3 create a genesis db of its own, as it might assume there are no other nodes?

What happens then when it does connect to the network? Would it somehow conflict with app1 and 2, instead of the three apps nicely fetching new data from each other to get synced?

All of these problems are solved by using CRDTs and so-called "Lamport Clocks." You can read up on them and I can provide the journal articles if you want, but essentially all the timestamps in an orbitdb are "vector" timestamps, meaning they are sorted not only by a regular timestamp but also the ID of the node contributing the entry.

So, when it comes time to merge (again handled by IPFS pubsub) then you can get the state of the database using a pure function across a unique and consistent set of values from all peers acting on the DB.

@aphelionz
Copy link
Member

Closing this for now and directing people to the new OrbitDB Field Manual. Feel free to open issues and PRs there if you have more questions or if things don't make sense!

@FredyVia
Copy link

I understand that the genesis database can be created first and then accessed directly by using the address in the three apps. However, I have another problem: after each data change ({ pin:true }) is it OK to take data directly from IPFS(Every app's ipfs daemon is always online!)? Even if a new 'app100' comes in, it only has the address of the database.can it replicate the database?I tryed in

const create = require('ipfs-http-client')
const OrbitDB = require('orbit-db')
function sleep(ms) {
    return new Promise((resolve) => {
        setTimeout(resolve, ms);
    });
}


async function listenPublicDatabase(orbitdb, address) {
    const publicDatabase2 = await orbitdb.open(address)
    console.log('in ipfs2, opened')
    publicDatabase2.events.on('replicated', () => {
        console.log("db1 ed time:", publicDatabase2.all)
    })
    await publicDatabase2.load()
    console.log('in ipfs2, loaded')
    console.log('pub2', publicDatabase2.all)
}
async function main() {
    const ipfs1 = create("http://localhost:50001")
    const ipfs2 = create("http://localhost:50002")
    const orbitdb1 = await OrbitDB.createInstance(ipfs1, { directory: './orbitdb1' })
    const orbitdb2 = await OrbitDB.createInstance(ipfs2, { directory: './orbitdb2' })
    // create a database on ipfs1
    // const publicDatabase1 = await orbitdb1.keyvalue("NetDisk-PublicDatabase1", {
    //     // Give write access to everyone
    //     accessController: {
    //         write: ['*']
    //     }
    // });
    // console.log(publicDatabase1.address.toString())
    const publicDatabase1 = await orbitdb1.open('/orbitdb/zdpuAzEJHgs9ksvYSiGwTG68LRfLJ2N29bC5C6NMxW79KpbKf/NetDisk-PublicDatabase1')
    await publicDatabase1.load()
    console.log('pub1', publicDatabase1.all)
    await publicDatabase1.put("time", '1-249', { pin: true })
    console.log('start to wait')
    await sleep(15000);
    console.log('end to wait')
    console.log('pub1', publicDatabase1.all)
    // await publicDatabase1.close();      // **************
    // await orbitdb1.disconnect()           // **************
    await sleep(15000);
    // try to access the database on ipfs2,but get empty,so go to listen on replicated
    listenPublicDatabase(orbitdb2, publicDatabase1.address.toString())
    // update the database ,see if replicate works
    // await publicDatabase1.put("time", new Date().toString(), { pin: true })
}
main()

In the tag line(with ******************), the result of comments it or not is different. That is, when orbitdb2 has older data, orbitdb1 updates the data (it has been pinned, and in ipfs node bootstrap has been added between nodes. In order to ensure that I even run ipfs swarm connect ***, then orbitdb1 node goes offline, Ipfs1 is still online) but it can't be obtained through ipfs2 (I even executed ipfs get zdpuazejhgs9ksvysigwtg68lrflj2n29bc5c6nmxw79kpbkf in command line). It can only be obtained through the orbitdb node by replicating. Do I need an orbitdb node to run all the time to receive the changes of someone who may be offline?

@FredyVia
Copy link

FredyVia commented May 15, 2021

my problem is a more complicated version of issue 460
I separated ipfs and orbitdb.
issue 460
all the ipfs node is always online.
orbitdb1 comes online, loads the db, does some operations and updates the values of the database and goes offline(ipfs1 is still online orbitdb1 is disconnected). orbitdb2 comes online, loads the database, and starts reading the entries. Does the entries contain the ones made by orbitdb1?
here is the cmd output(before, both databases is synchronized by db.events.on('replicated'))

pub1 { time: 'new Date().toString()' }   //  read the old value in orbitdb1:'new Date().toString()'
start to wait // wait 15000 ms
end to wait
pub1 { time: 'Sat May 15 2021 18:01:22 GMT+0800 (中国标准时间)' }  // update value(pin:true) in orbitdb1 and sleep 15000 ms
in ipfs2, opened // orbitdb2 online and open("/orbitdb/...../......")
in ipfs2, loaded // orbigdb2.load()
pub2 { time: 'new Date().toString()' } // orbitdb2 get the unchanged value

@phillmac
Copy link
Member

phillmac commented May 15, 2021

You need at least one other orbit db instance to be online for the exchange heads process. After it's replicated any changes then the second instance should be able to load the db fro.m its local cache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants