Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

pub/sub - publish / subscribe #64

Open
3 tasks
jbenet opened this issue Oct 9, 2015 · 77 comments
Open
3 tasks

pub/sub - publish / subscribe #64

jbenet opened this issue Oct 9, 2015 · 77 comments

Comments

@jbenet
Copy link
Member

jbenet commented Oct 9, 2015

We've known for some time we need to layer a pub/sub system on IPFS. We should try to reuse other things. The least work, the better. But it should be a simple protocol, easy to implement, well-layered, and meshing well with the rest of IPFS abstractions.

Requirements

  • very, very fast
  • flexible (maybe different topology-forming algorithms)
  • multiple modalities (single publisher, multiple publishers, etc)
  • support both encrypted and unencrypted streams (encrypted again, this is above the regular libp2p encryption -- and specific to the pub/sub group)
  • support privately encrypted channels (ie user supplied keys)
  • layers over IPRS to do discovery

We need to:

  • do a survey of relevant {literature, protocols, and implementations}.
  • decide on a protocol
  • build it into libp2p.

I likely won't have time to do a proper survey until late Nov or Dec. If you'd like to speed this along, post links to great {papers, systems} here for me.

Relevant to research:

@jbenet jbenet changed the title pub/sub pub/sub - publish / subscribe Oct 9, 2015
@davidar
Copy link
Member

davidar commented Oct 9, 2015

Cc #42

@bharrisau
Copy link

It might be easier start with a basic [slow] implementation before doing the high performance multicast P2P thing. For example, this paper (https://www.cs.utexas.edu/~yzhang/papers/mad-info09.pdf) has two modes depending on how active the group is.

The basic implementation would be as simple as:

  • Pick the IPNS address you want to subscribe to changes at
  • Append signed message with nodeID and TTL/expiry to DHT at IPNS address
  • Owner of IPNS address checks all subscriptions and sends a notifyIpnsUpdate message to each node

Encrypted streams with group-specific encryption, multicast and multiple publishers can then be deferred to the more advanced implementation.

@bharrisau
Copy link

I guess the basic could be made even simpler by using the backlink ideas (ipfs/ipfs#31) instead of a new interface to the routing. You then only need a new P2P message to notify a node of changes.

@spikebike
Copy link

I stumbled upon this "Decentralized Reddit using a DHT to store content and a blockchain to rank it"
https://news.ycombinator.com/item?id=10391996

The first comment seems particularly relevent:
liamzebedee 21 hours ago
Re: the hosting of topics/subreddits in the DHT, I've done quite a lot of research [1] into a very innovative yet not well known P2P publish-subscribe network design [2] from some Norweigan computer scientists that removes the role of hosting for nodes not interested in a topic, even designing a decentralised microblogging platform on top of it [3].

[1] http://liamz.co/wp-content/uploads/2015/03/Computer-Science-Extended-Essay_Liam-Edwards-Playne.pdf
[2] http://acropolis.cs.vu.nl/~spyros/www/papers/PolderCast.pdf
[3] BitWeav http://liamz.co/wp-content/uploads/2015/03/whitepaper.pdf

I'm reading the papers, but it seems like some very interesting discussion for pubsub. In particular Poldercast's finding a subset of nodes with a particular interest and only those nodes host that interest reminds me of IPFS's policy of not downloading anything unless you ask for it.

I'll read the papers before I make any specific pub/sub recommendations.

@jbenet
Copy link
Member Author

jbenet commented Oct 18, 2015

@bharrisau agreed on starting simple to get something working and moving to more efficient constructions later.

@spikebike [1] link is broken.

@davidar
Copy link
Member

davidar commented Oct 18, 2015

@jbenet it should be

[1] http://liamz.co/wp-content/uploads/2015/03/Computer-Science-Extended-Essay_Liam-Edwards-Playne.pdf

@spikebike
Copy link

@davidar @jbenet thanks, right, fixed.

@rabble
Copy link

rabble commented Nov 8, 2015

The xmpp stuff proved really hard in comparison to federation doing pubsubhubub.

@jbshirk
Copy link

jbshirk commented Nov 13, 2015

Maybe not new to folks here, but new to me:
https://hackpad.com/Probabilistic-data-structures-7UPPH2soDvw

@jbshirk
Copy link

jbshirk commented Nov 13, 2015

Now I see that discussion about the use of Bloom filters is already underway:
https://github.com/ipfs/ipfs/issues/31#issuecomment-55875124

@jbenet
Copy link
Member Author

jbenet commented Nov 26, 2015

See also http://www.w3.org/TR/push-api/ (HT @nicola)

@KrishnaPG
Copy link

Came across this (IPFS) while looking for pubsub + rpc over webtorrent. Out of the all existing pubsub and rpc protocols, the best one that comes close to practical use is WAMP / AutoBhan. It facilitates both pubsub and rpc over web (means browser to browser rpc + pub/sub) and light-weight enough to run on IOT devices (raspberry-pi etc.).

Its easy to setup and highly performant. However, the major problem with WAMP is: it is 'routed' / 'brokered'. If there is a way this WAMP can be integrated with webRTC/bittorrent (webTorrent), for P2P, then it would pave way for next generation IOT apps.

For PubSUB - here is one sample functionality what we are trying to achieve with IOT:

  • Imagine large file served through bittorrent (means, many chunks of the file are served from multiple sources)
  • Now, imagine these chunks are all updated by different sources / sensors independently (like rows in a databases)
  • Whenever a chunk is updated, all the readers connected to that chunk should be updated with that new data. Just like BitTorrent, except the connection between the file chunk and reader 'stays alive' (like comet / long-poll of http).

Similarly for RPC - here is one sample functionality that we are looking to achieve:

  • Imagine large scientific data file (once again spread across hosts as chunks served by, say bittorrent), each chunk containing large set of records
  • We should be able to run a function over all the records, but the data should not be moved across machines. Rather copies of the function should get executed as an RPC call over each host (with only part of the data local to that host), similar to HDFS + map-reduce.

Is it possible to achieve above kind of functionality with IPFS? If not yet, I would suggest to strongly consider integrating WAMP as pubsub+rpc part of the protocol (rather than reinventing the wheel). Autobahn implementations of WAMP comes with clients for many languages and performance metrics are also very good.

@fazo96
Copy link

fazo96 commented Nov 29, 2015

@KrishnaPG as soon as pub/sub is implemented, you should be able to do all that! You can use IPNS to expose each node's chunks. You can aggregate the data by having a list of the nodes on each node. For RPC calls, you can't do that using IPFS, but you can have the nodes download the code to execute from somewhere using IPNS, publish their results, then you'd have to aggregate them.

You can't really do the RPC stuff you described but your usecase would work really well using only IPFS for all your networking (assuming it has pub/sub implemented) you just need to implement that functionality in a different way.

Also keep in mind that IPFS has global deduplication 👍 But pub/sub is not implemented yet.

@KrishnaPG
Copy link

Thank you @fazo96 .

Yes, you are right - Doing RPC (moving the code to the data and executing it locally on each chunk) may require an additional layer on top of IPFS (which involves additional functionality that involves job queues, schedulers, retry mechanism and result aggregators).

If we look at it, fundamentally RPC is kind of opposite to the basic file-read or pub/sub (in terms of data flow direction).

For example, in a simple file-read/copy, data goes from the disk/chunk --> the client/reader. Whereas, in RPC the data (the code to be executed) has to go from the client --> the disk/chunk and get executed, and the results should either go back to the client (if the size is small) or get stored as additional files/chunks on the local machine (if the results data size is large).

This requirement to be able to create additional chunks/files on the host locally may need support from the base IPFS, though.

On the other-hand, there is another radical way to look at this.

That is, treating every operation (including file_read, copy, delete etc.) as RPC call, and allowing transformations over basic operations. For example, consider this

                Fn()
client ------------------> chunk

In a normal read operation, the Fn = get_me_chunk(), withget_me_chunk being the usual built-in file-read Op.

And when we need to execute, say do_something on the chunk it would become Fn = do_something(get_me_chunk()) with both functions getting executed locally on that chunk. The client would send the do_something code to the chunk and get back results as usual (or gets back the details of additional chunks created and stored as results).

Fn here can be thought of being similar to HTTP Verb (GET/PUT etc.). The verb can take pre-defined functions (the usual CRUD ops), and also custom-defined ops (where the function code is passed along as the request body).

This model treats RPC as first-class citizen (where all the regular operations, such as file CRUD operations and notifications are implemented on top of RPC). Not sure how difficult/easy it would be to do this with present architecture, though.

For the pubsub, wondering how easy/complex it would be to reuse/integrate the Autobahn. If it can be done, then RPC comes for free on top of it (Demos).

@jbenet
Copy link
Member Author

jbenet commented Dec 1, 2015

AFAICT, autobahn requires websockets, that's too much, we need something more basic that can run over any transport. we also need a pub/sub that can scale to millions of subscribers -- this isnt going to cut it: http://wamp-proto.org/why/#unified_routing -- we need a protocol that creates spanning trees/dags with fan out on its own, using measured latencies/bw in the network, etc. basically, a serious protocol from the multicast literature.

@jbenet
Copy link
Member Author

jbenet commented Dec 1, 2015

@KrishnaPG you should read more into how IPFS works and how the protocols work. suggest also looking at:

@KrishnaPG
Copy link

Thanks @jbenet I was looking for the spec info, your pointers are helpful.

As for WAMP, yes - it started out as websocket based initially. Now, it is decoupled and works with any message based transport (http://wamp-proto.org/faq/#is_websocket_necessary_for_wamp)

However, my intention in pointing to WAMP was not to use it as is, but rather to adapt its pubsub+RPC part of the spec while removing the broker part (replacing it with whatever routing the ipfs uses, dht etc..)

WAMP is a perfect protocol for IOT requirements, but the strong dependency on router/broker is a deal-breaker.

@jbenet
Copy link
Member Author

jbenet commented Dec 2, 2015

@KrishnaPG ah ok, good they generalized it

@fsantanna
Copy link

Hi,
I see a lot of discussion on how to implement pub/sub, but not really about the semantics of pub/sub for IPFS (is it that too obvious?).
How will pub/sub be exposed to users (the API)?
The idea is to have something as simple as

machine-1$ ipfs sub <topic>
<hash-1>  # receive when this is published
<hash-2>
...

machine-2$ ipfs pub <topic> <hash-1>

machine-3$ ipfs pub <topic> <hash-2>

or I am missing something here?

@fazo96
Copy link

fazo96 commented Dec 14, 2015

@fsantanna

I think the bare minimum pub/sub should expose this kind of interface:

ipfs.sub(target_node_id, handler)

handler would be called every time IPFS finds a new hash published by target_node_id. Of course a more complicated API could be built, this is the bare minimum but still very useful implementation.

CLI version:

$ ipfs sub $target_node_id
/ipfs/...
/ipfs/...
# A new line is emitted every time a new record is found

@davidar
Copy link
Member

davidar commented Dec 15, 2015

Re RPC: we've discussed this briefly before, but I'm not sure what the actual plans are

@fsantanna
Copy link

@fazo96
In this case, the subscriber has to know about (and subscribe to) every single potential content provider of his interest.
Shouldn't pubsub decouple publishers from subscribers?

@Stebalien
Copy link
Member

@RangerMauve

So is the floodsub:$topic_name pattern the way to go for bypassing the DHT not allowing arbitrary get/put? :P

It's not really an arbitrary get/put. It puts a record mapping CidOfBlock("floodsub:$topic_name") to the peer's ID.

Would a new pubsub implementation be welcome if it conforms to the same public interface?

You're free to do so but it may break in the future. There's a reason this is marked experimental (although we will try to avoid breaking things too badly).

@jachiang

So the topic needs to be known apriori? If a secret channel topic is not made public for example, there is no way it can be exposed?

You do need the topic to subscribe to it (there's no "list all topics" command). However, your node will tell all of its peers which topics it's interested in so they're hardly secret.

How does OpenBazaar put profile info into DHT? Is it only on their OB network I suppose?

Yes. We explicitly do not want to allow arbitrary data on the DHT (too easy to abuse). The correct way to do that is with IPNS (IPNS record points to a "profile" stored in IPFS). IPNS is currently a bit... finicky in practice.

@RangerMauve
Copy link

Thank you for all the details @Stebalien!

From the README, it looks like the JS version of the DHT isn't totally implemented. How does pubsub work in the browser if the DHT support is iffy?

@Stebalien
Copy link
Member

As I said, that particular peer-finding mechanism is a bit of a hack. I doubt that JS uses it. @diasdavid?

@paralin
Copy link

paralin commented Jan 26, 2018

Is there any progression on the pubsub front?

@RangerMauve
Copy link

I've been thinking about the state of pubsub, was wondering if I could get comments on the following:

I think that constructing a minimum spanning tree (MST) for groups is the way to go.

  • Use one of the algorithms out there for Distributed Minimum Spanning Trees
  • For the "distance" use the logical distance between peer IDs
    • Using latency is hard because you'd need to connect to peers before knowing what your latency is
    • Logical distance is what the DHT already uses for logic, so you're likely to be connected to peers with a close distance to you already if you've bootstrapped into the dht
  • Individual nodes that may not be subscribed to much will end up getting more traffic, but there will be less traffic overall in the network
  • Instead of relaying based on listeners, join "networks" with a given ID and broadcast all events through your network
  • Have peers publish to the DHT with which pubsub network they're on for bootstrapping the MST
  • Find more peers by querying the peers you're connected to
  • Subscribing becomes filtering what data your application will react to when events are passed through the node
    • This allows fancier patterns for subscribing like MQTT wildcards
  • Network IDs can be hashes of public keys and packets should be automatically encrypted with the key using AES or something
    • Maybe allow for signing instead of encryption so that there can be nodes that only consume and relay and can't publish
  • Allow for cleartext network names for where security isn't a consideration (a la current pubsub)
  • Maybe network ID is based on a hash of a network description object in IPLD (name, encryption type, etc)

@paralin
Copy link

paralin commented Feb 2, 2018

Distributed minimum spanning tree does look like the right approach for this, I have to say.

@pgte
Copy link

pgte commented Feb 2, 2018

@RangerMauve interesting!

About the "distance" function, I'm thinking that poorly connected nodes could impact the network if the latency or failure rate is not taken into account.
Discovering new nodes will be done through some sort of gossip protocol, right?
Idea: why not also spread the perceived "health" of nodes in a cooperative way?
I think there are already some models for this, from the top of my head:

Perhaps this could bring some type of decentralised measure of "health", which could then be used to calculate distance?

@RangerMauve
Copy link

@pgte The distance function is there to determine which nodes are logically "close" to each other in order to determine who should connect together to form the minimum spanning tree. The main benefit is that it's computationally cheap to determine the logical distance since all you need is the ID of the peer, a node can calculate it's distance to any peer without connecting to it or knowing anything else about it. I'm taking this concept directly from the kademlia DHT design (which is the DHT used in IPFS)

I think peers will only bother learning about the nodes connected to their immediate neighbors, so some sort of gossip protocol will be useful here. I'm still reading up about the different strategies out there for distributed MSPs so I'm not 100% sure what the requirements are yet.

I like the idea of adding weights based on what a node things the "health" of their peers is, I'd be worried about spreading that information though, because two nodes may have a poor connection between each other, but another node may have a good connection to both based on what their physical relationship is.

I think that for an MVP of a MSP type pub-sub protocol, adding in health measures will complicate the implementation, but it should definitely be investigated once we can actually construct a MSP.

@pgte
Copy link

pgte commented Feb 2, 2018 via email

@RangerMauve
Copy link

I've been tinkering around and here's a demo of an algorithm I've made for building a MSP.
In our case the actual weights don't matter too much because they're arbitrary logical distances.

  • Every node will first boostrap by getting a list of random other peers.
  • It will then sort those peers by their logical distance using an xor of their IDs.
  • It will then choose the closest peer that it isn't already connected to and connect to it

I've made a demo with 32 nodes which boostrap with a list of 6 random nodes.

Here's the link

I'm not sure how viable this is yet, so I'll work on other visualizations in order to catch possible partitions forming.

@RangerMauve
Copy link

I've updated my demo to better visualize what's happening.

I've found that having nodes find a random list of other node IDs, then sorting them my logical distance and connecting to the closes one, I always get a fully-connected tree.

I'll explore logic for visualizing messages being sent down the tree (for pub sub) and dropped connections.

@paralin
Copy link

paralin commented Feb 17, 2018

@RangerMauve what you're describing really is just a network topology mesh, where your route priority metric is driven only by logical distance.

Look at the Babel routing algorithm, there is a lot we can learn from that. Batman-adv as well.

@RangerMauve
Copy link

@paralin Yes. I'm not focusing on routing data across the mesh at the moment. I'm focusing on forming the mesh in a way where individual nodes know every little about other nodes or the shape of the graph itself.

The existing pubsub connects to nodes at random and has a chance of having "supernodes" which have too much bandwidth going to them and network partitions which prevent groups of peers from sending events between each other.

Once an efficient and self-healing mesh is set up, it can be a basis for the protocols that will be built on top (pubsub to start, but other things can probably make use of it).

My initial goal is to make a more efficient pubsub.

I love what the Kademlia DHT does for discovering peers that have a resource, but there needs to be something for forming a persistent network for distributed applications that need to get data to all nodes interested in it.

@paralin
Copy link

paralin commented Feb 17, 2018

@RangerMauve Wrt. Babel: I am suggesting we use it as a reference for more reasons -

  • They also limit knowledge of the network topology to only the next hop.
  • They avoid routing loops (black holes).
  • They successfully balance multiple routing metrics, and have experience with understanding which to prioritize when.

The same goes for batman-adv, except that project focuses more on radio transmission optimizations.

This matches with what you're talking about with mesh balancing. It's the same problem we are solving. Routing in the typical internet sense is the same thing as routing here.

It might be a good idea to look at generalizing the concept of routing a bit, and applying it in pub-sub as well as content and network routing.

@RangerMauve
Copy link

@paralin Sounds good. Is the RFC a good place to start?

Babel looks like a protocol that should be used after the actual connections are made between nodes. I'm still at the stage where I'm figuring out which nodes should connect to which, but after that this will be great for making the actual traffic over the mesh be more efficient.

@paralin
Copy link

paralin commented Feb 17, 2018

@RangerMauve I see what you're saying, they are actually two different problems actually when you consider the structure of the open internet, and don't limit connections to link-local.

In a way though it is actually the same thing. Consider this: all of the peers online in the world are actually just next-hop peers in the babel algorithm. What you're doing is trying to figure out what subset of that global set to connect to, and then prioritize further based on that. But it's still the same problem.

The 'brute-force' solution would rank all of the known peers based on available information about them, used to compute a routing metric. You then sort by the routing metric, chose the top N nodes, connect to those, etc.

Babel fits here as a mechanism of generating that routing metric. It's one of many. I think maybe a pluggable system for deriving routing metrics might be a good approach to this. You can run the fast route scoring algorithms first, before exploring a more advanced plan. It might be possible to map this to some sort of opportunistic graph solver.

@RangerMauve
Copy link

@paralin I get what you're saying, but I think that's a bit heavy and requires a node to know a lot more peers. With my current algorithm, a node can search the DHT for the first 20 random nodes it finds, and then connect to the mesh without focusing the load too much in once place. All it needs to do to decide who to connect to is their ID without having to somehow measure latency or any other metrics about them. Otherwise a node would need to connect to the network, learn about potential peers and how they might relate to themselves, and then actually connect to the network. Though this will be more efficient in the long run, the initial setup is likely going to be too long for the time it takes for a typical web app to load.

@paralin
Copy link

paralin commented Feb 17, 2018

@RangerMauve We are in agreement, I'm just generalizing what you're describing a bit.

You chose a next peer to send traffic to, then you build a connection to that peer, then you send the packet to the peer. The question of where is best to send the traffic is the same as the question of where to connect to next. What you are proposing, is to start with no information about the peers, and to randomly connect to some from the DHT, and then optimize from there. This is the same as what I described in my last comment - you start with no information, you make a decision at random, and then as you learn more information about the network, the routing can be optimized further.

Babel is just one example of a protocol to quickly exchange information about available routes in a closed network. It does not apply to this issue directly, but it is an example of an efficient way of exchanging information about the network between peers that happen to already be connected.

If we structure the code in such a way that these routing metrics, and the way they are prioritized and balanced together, are pluggable, then we can actually start to make use of the traffic metrics in LibP2P to make better routing decisions.

@paralin
Copy link

paralin commented Feb 17, 2018

There must be some way to make a incentive-based system like BitSwap, but for incentivizing routing packets through the network. (Different problem, of course :))

@RangerMauve It's a little bit like a particle filter in a way... If you make a measurement of the ping to a peer, for example, the confidence and derived weight that information has on routing decisions can decay over time (temporal decay) as well as when you have other detected changes in the network topology (event-based invalidation). If we have the structure described above for pluggable metrics, you could get really fancy really quickly with algorithms to make, probably.

@ORBAT
Copy link

ORBAT commented Mar 2, 2018

The BitSwap ledger is already based on bytes sent and received, so I'd assume it should be relatively simple (and I do mean relatively) to extend it to also account for routed bytes (maybe as a separate category, or with some weight etc. since message sizes are likely negligible compared to blocks)

@ORBAT
Copy link

ORBAT commented Mar 2, 2018

Oh, and I don't know if someone's already looked into Chord, but there's been some work on how to build pub/sub systems with it:

  • This paper gives the basic idea
  • This paper shows how to implement equality, suffix, prefix and contains predicates for subscriptions
  • This paper gives an overview of content-based pub/sub systems, including DHTs

There are also a couple of different pub/sub-like schemes over Pastry:

I'd argue that mimicing their approaches might be a good way to go. There's likely similar systems on top of Kademlia, so it might be worth while to look into them. Chord, for example, is provably correct and functions well even with high churn.

edit: for an example of a wildly different approach, there's Secure Scuttlebutt that builds on the Scuttlebutt gossip protocol. Having a gossip protocol in libp2p doesn't sound bad at all.

revenge of the edit: @spikebike talked about PolderCast some years ago, and as he said, it seems like an extremely good choice.

@ghost
Copy link

ghost commented Mar 5, 2018

I'm curious if you folks are aware of what these folks are up to.
libp2p/go-libp2p-pubsub#67

@paralin
Copy link

paralin commented Mar 7, 2018

Submitting aeron for us to read about and investigate:

It seems to be a publisher/subscriber system that uses efficient UDP multicast or inter-process communication to get the job done.

@ORBAT
Copy link

ORBAT commented Mar 8, 2018

What's the relationship between this discussion and floodsub? Is floodsub the "official" pub/sub for IPFS, or is this discussion completely independent?

@daviddias
Copy link
Member

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests