-
Notifications
You must be signed in to change notification settings - Fork 91
Bacalhau project report 20220815
We've been doing a lot of design work and have quite a few WIP branches this week. That's fine, it's the middle of the month :-)
We are also working more on the "master plan part 2" and recruiting - more soon on those items!
Our new publisher interface is how we will interface with Filecoin.
This will allow us to write results more permanently to upstream storage.
We are collaborating with the Estuary folks and have several SP convos lined up.
- Add support for >1 publish driver
- Consider how this relates to the verifier (split verifier into output driver and actual verifier)
- Write output driver for Lotus
- Shell out to CLI
- Write output driver for Web3.storage/Estuary
- Simple API driver
- Results publishing interface
- Executor finishes job
- Tell network, I’ve finished, enough to verify me
- Verification happens, it’s ok
- Compute nodes publish local folder somewhere
- [Create] -> [Shard] -> [Bid] -> [Lots of sub-jobs run] -> [Verify] -> [Publish]
- Key question: should requestor node or compute node do the publishing?
- NB: Publishing is really backing up
- Lazy reassembling
- Can we avoid any one machine having to have a big enough disk to reassemble the whole result?
- Avoid reassembling!
- Therefore, the compute nodes HAVE to do the publishing (they already have the data)
- Example: S3 Publish drivers → 100 nodes publish individual objects to a bucket
- For now, compute nodes will have to hold the credentials (e.g. keys) for any publish endpoint - we have no way to securely transmit secrets from clients to compute nodes, given that we can't trust the compute nodes
- Improve verifier interface to include verification steps
- Add "publisher" interface that knows how to upload results of jobs once verified
We are developing a "consistent hashing" style approach to picking a subset of nodes to bid on a given job. This should reduce network traffic still further, allowing us to scale performance much more easily. Only once we have non-O(N^2) network traffic we should start thinking about optimizing the transport.
Enrico is working on the dashboard to show network stats over time:
This is still a WIP, more soon! This is the tracking issue.
- Implement "lotus" and "estuary" publisher interfaces
- Implement and test consistent hashing approach