Pushing the Limits of IPFS and OrbitDB

Posted by Mark on January 24, 2018 under Tech Notes

I’ve had my eye on the Interplanetary File System (IPFS) project for since at least 2016. It’s the brand-new, content-addressed, non-blockchain protocol on the block and it has all the best parts of git, bittorrent, and a few other things rolled into one. Easily a top-five technology to watch in the coming years. OrbitDB is a serverless, distributed, peer-to-peer database built on top of IPFS.

Early on my involvement was minimal: I lurked on their Freenode channel, built a toy pastebin with it, and talked about it every reasonable chance I got. However, my work on TallyLab has given me an occasion to play with IPFS more.

In this post I want to share with you how I overcame one of our biggest hurdles so far: Getting OrbitDB databases in the browser to replicate with OrbitDBs in node.js.

Table of Contents

The Easy Part: The Browser Side of Things

IPFS is a specification with multiple implementations: Go, JavaScript, and Python. Most people run go-ipfs, unless they’re working in the browser where they have to use js-ipfs instead.

One of the cool things about js-ipfs is that it works with node.js and browser javascript. Another cool thing about js-ipfs is that it actually launches a bona-fide IPFS node inside your browser. OrbitDB requires a fresh IPFS node that it can “own,” so this is helpful in also getting Orbit set up.

Here’s a snippet of code from a browser implementation that I’m working on. It’s fairly straightforward. You can get both OrbitDB and IPFS from jsDelivr or any other similar Content Delivery Network.

<script src="https://cdn.jsdelivr.net/npm/ipfs@0.27.7/src/core/index.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/orbit-db@0.19.3/src/OrbitDB.min.js"></script>

Then, in your javascript:

var ipfs = new Ipfs({
  repo: "ipfs/shared",
  config: {
    "Bootstrap": [
      // Leave this blank for now. We'll need it later
    ]
  },
  EXPERIMENTAL: {
    pubsub: true // OrbitDB requires pubsub
  }
});

ipfs.once('ready', async function() {
  var ipfsId = await ipfs.id();
  var orbitdb = new OrbitDB(ipfs)
  window.globaldb = await window.orbitdb.log(ipfsId.publicKey);
  await globaldb.load();
});

This initializes an OrbitDB instance keyed to your IPFS node’s id. There are a few kinds of OrbitDB schemas you can use, we decided to go with a trusty append-only log. Then, you can get the Orbit multiaddr by running the following command:

var db1Addr = globaldb.address.toString()
"/orbitdb/Qmd7onRynKUWNP13uUu3r5XAio1omL1p1gcohQ2pmH9Z48/QmVpwjDqejU7Wu3SGPVJc3JPQBtzqnNHYj5nKp1J1wmDzb"

Then, if you want to replicate, you simply pass that into the following code, in another browser:

const orbitdb2 = new OrbitDB(ipfs2, './orbitdb2')
const db2 = await orbitdb2.log(db1Addr)

db2.events.on('replicated', () => {
  const result = db2.iterator({ limit: -1 }).collect().map(e => e.payload.value)
  console.log(result.join('\n'))
})

That code will simply loop through the database records and log them out. However, if you run the add function inside database one, you’ll see the records replcated in database two. Completely peer to peer, with no server setup required. It would be disingenous to say that there are no servers involved whatsoever because you’re going through somebody’s IPFS server and somebody’s backbone ISP, but. it’s still pretty damn cool.

The Problem: js-ipfs Can’t Swarm

One of the not-so-cool things about js-ipfs is that it’s hard to get it talking to anything else besides other browser nodes. This is due to two current problems they’re working through right now:

  1. Distributed Hash Table (DHT) which searches through the entire network for content
  2. Circuit Relay which uses NAT traversal techniques to connect nodes through other nodes.

Please do keep in mind that these technologies are still in the 0.XX version numbers, and are realistically nowhere near production-ready… if you’re a wuss. Is that going to stop us?? Hell no! We’re gonna get these nodes talking, or die trying.

The Solution: Three Attempts But Only One Win

Note: The next part of this post details the exploratory phase of how I came to solve the problem. If you just want to read about the solution, skip to “Attempt Three” below.

Another note: This is all going to work great on localhost, but if you want to do any serious devlopment involving publicly available resources, you’ll need SSL. That is beyond the scope of this post, but I may detail that in the future.

Attempt One: A Standard Issue Go Node

I like things simple, especially when dealing with bleeding-edge fringe technology like IPFS. I wanted to see how far installing a standard go-ipfs node would get me. It turns out, pretty far!

There’s a handy tool called ipfs-update that makes installing the ipfs command line tool easy. Get ye a linux server with go (>= 1.8) installed on it:

$ go get -u github.com/ipfs/ipfs-update.
$ cd ipfs-update
$ ./ipfs-update versions
v0.3.2
v0.3.4
v0.3.5
...
v0.4.12
v0.4.13-rc1
v0.4.13
$ sudo ./ipfs-update install v0.4.13

Alternatively, you can get ipfs-update from dist.ipfs.io.

Then, we initialize ipfs and get our ID, which we will need later:

$ ipfs init
$ ipfs id
{
  "ID": "QmYF2WUPCVceq9PgPrCrstKqp2Fpgyst2PMPu7PwJJvBRM",
  "PublicKey": "CAASp...mLKO7iUx6BAEBpZAgMBAAE=",
  "Addresses": null,
  "AgentVersion": "go-ipfs/0.4.13/",
  "ProtocolVersion": "ipfs/0.1.0"
}

Note the ID for future reference.

Then we run the daemon

$ ipfs daemon
Initializing daemon...
Swarm listening on /ip4/127.0.0.1/tcp/8082
Swarm listening on /ip4/127.0.0.1/tcp/9999/ws
Swarm listening on /ip4/172.17.0.14/tcp/8082
Swarm listening on /ip6/::1/tcp/8082
Swarm listening on /p2p-circuit/ipfs/QmYF2WUPCVceq9PgPrCrstKqp2Fpgyst2PMPu7PwJJvBRM
Swarm announcing /ip4/127.0.0.1/tcp/8082
Swarm announcing /ip4/127.0.0.1/tcp/9999/ws
Swarm announcing /ip4/172.17.0.14/tcp/8082
Swarm announcing /ip6/::1/tcp/8082
API server listening on /ip4/127.0.0.1/tcp/5001
Gateway (readonly) server listening on /ip4/0.0.0.0/tcp/8081
Daemon is ready

This, more or less, gets you a working IPFS node. You’ll have a gateway running on port 8080, an API running on port 5001, and then a “swarm” port, 4001.

Next we need to get the js-ipfs node in our browser and our go-ipfs node talking. Reemember, they can’t “swarm” via DHT, so we need to manually tell them about each other. The IPFS folks provide a tutorial and example on exchanging files between a server and browser, but I’ll paraphrase here. First, we need to edit the go-ipfs config to add websocket support:

$ ipfs config edit
"Addresses": {
  "Swarm": [
    "/ip4/0.0.0.0/tcp/4002",
    "/ip4/127.0.0.1/tcp/9999/ws" // Add this line
  ],
  "API": "/ip4/127.0.0.1/tcp/5002",
  "Gateway": "/ip4/127.0.0.1/tcp/9090"
}

The second entry under swarm is the one we want to add. That will enable a websocket listener on port 9999, which is the one we need.

Then, Remember that “Bootstrap” config above that I told you we’d circle back to? Well now’s the time to add a line in there:

var ipfs = new Ipfs({
  repo: "ipfs/shared",
  config: {
    "Bootstrap": [
      "/ipv4/###.##.###.###/tcp/9999/ws/ipfs/QmbuTRFUhf8EBRjY8rRKcpKEg3ptECvgyqP2PRDij5h8cK"
    ]
  },
  /* ... */
});

I effectively deleted all the other bootstrap nodes for simplicity, but I might add them back later, I dunno. You can check from the browser console that you’re connected by running:

peers = await ipfs.swarm.peers()
peers.map((p) => p.addr.toString())
["/ipv4/###.##.###.###/tcp/9999/ws/ipfs/QmbuTRFUhf8EBRjY8rRKcpKEg3ptECvgyqP2PRDij5h8cK"]

It should return your server’s multiaddr.

On your server, you can likewise check that your browser peer is connected. Replace the hash below with your browser’s peer id.

$ ipfs swarm peers | grep QmRT7VHB5LQFw4GC6aQHZJVpY8yfz2L5mgD5M2QypAUbrx
/ip4/95.76.83.223/tcp/65078/ipfs/QmRT7VHB5LQFw4GC6aQHZJVpY8yfz2L5mgD5M2QypAUbrx

If you see peers connected in both tests, then a simple test will show that you can indeed transfer files between a go-ipfs and js-ipfs.

$ echo "Created on the server" > test.txt
$ ipfs add test.txt
added QmeV3HasXnvBNNNqmg8HNkHf2HaatsweE7HMHwg7N6cTkD test.txt

This creates a file on our go-ipfs node. We can then retrieve it from the browser like so:

ipfs.files.get("/ipfs/QmeV3HasXnvBNNNqmg8HNkHf2HaatsweE7HMHwg7N6cTkD").then((e,f,b) => console.log)

Likewise, we can create a “file” in the browser:

buffer = ipfs.types.Buffer("Created in the browser")
file = await ipfs.files.add(buffer)
file[0].hash
"QmPqzE7KAkT5UX6Xf84aLabt5zgmmMkHr4ULGV576S437G"

…and get it from the server

$ ipfs get QmPqzE7KAkT5UX6Xf84aLabt5zgmmMkHr4ULGV576S437G
Created in the browser

However, while this works great for IPFS file transfer, OrbitDB is stranded because it needs a fullly substantiated js-ipfs node to sink its teeth into. There’s work underway to hook into IPFS’ REST API, but that’s not shipped yet.

Thus, a go-node is a no-go.

Attempt Two: Docker to the Rescu- o wait

I next thought that I could have go-ipfs running on a host server, and then create docker containers for each of the OrbitDB instances I wanted to run. This would be in the form of a node.js that ran inside each of the containers, each being its own self-contained IPFS/OrbitDB universe.

This was a good learning experience because it accomplished a number of things, just not what I ultimately needed. No code to show here, just my findings:

  1. It does make for a nice separation of resources
  2. each docker container can indeed expose its gateway, API and swarm
  3. Each docker container will immediately discover the host machine and connect to it

However, what you end up with, in essence, is a “local swarm” of IPFS nodes, and since js-ipfs cannot use DHT to traverse a swarm and find content, this is no better than a number of remote servers connecting to each other.

Luckily, I had one more idea that I wanted to try…

Attempt Three: The IPFS is Coming From INSIDE a node.js App

In order to have a proper IPFS node that’s publicly accessible, and to also satisfy OrbitDB’s requirement of having full access to a js-ipfs node, we need to mess with the nesting of components a bit.

Instead of creating a node.js script that tries to connect to a running go node, why not use js-ipfs to create an IPFS node inside of a node.js script?

Here’s a proof of concept. This would live in a node.js file like index.js and run on the server side.

const IPFS = require('ipfs')
const OrbitDB = require('orbit-db')

const ipfsOptions = {
  config: {
    "Addresses": {
      "Swarm": [
        "/ip4/127.0.0.1/tcp/4001",
        "/ip6/::/etcp/4001",
        "/ip4/127.0.0.1/tcp/4002/ws"
      ],
      "Announce": [],
      "NoAnnounce": [],
      "API": "/ip4/127.0.0.1/tcp/5001",
      "Gateway": "/ip4/127.0.0.1/tcp/8080"
    }
  },
  EXPERIMENTAL: {
    pubsub: true
  }
}

// Create IPFS instance
const ipfs = new IPFS(ipfsOptions)

ipfs.on('error', (e) => console.error(e))
ipfs.on('ready', async () => {
  const id = await ipfs.id()
  console.log(id);
  const orbitdb = new OrbitDB(ipfs, "./testreplicate")
  // The same orbitdb address from the beginning of the post
  var testDBAddr = "/orbitdb/Qmd7onRynKUWNP13uUu3r5XAio1omL1p1gcohQ2pmH9Z48/QmVpwjDqejU7Wu3SGPVJc3JPQBtzqnNHYj5nKp1J1wmDzb" // Your orbitdb
  const replicateddb = await orbitdb.log(testDBAddr)
  replicateddb.events.on("replicated", (address) => {
    console.log(replicateddb.iterator({ limit: -1 }).collect())
  })
})

There it is. If you add elements to your browser’s OrbitDB, they will be replicated here so long as the two nodes are directly connected. You’ll need to change your browser code to connect to this new IPFS id. However, everything else should work as expected.

Now, I don’t absolutely love this, as it feels more “centralized” that I’d like. However, there’s nothing stopping somebody from dockerizing this script and running numerous instances of it.

Conclusion

You CAN have replication between client and server-side OrbitDBs so long as you nest your IPFS node inside of a node.js script, and you connect both nodes manually. You won’t have to jump through these hurdles if any of the following conditions are met. This should happen in a near-future version of either OrbitDB or js-ipfs:

  1. js-ipfs implements DHT (pull request)
  2. js-ipfs implements Circuit Relay (pull request)
  3. OrbitDB implements REST API support (github issue)

Please email me any feedback you may have. Finally, see you at H.O.P.E. 2018 :)

Find this post and more on Mark's blog, mrh.io.