Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Efficient Set Reconciliation algorithm #3934

Closed
28 tasks done
pmnoxx opened this issue Feb 9, 2021 · 12 comments
Closed
28 tasks done

Implement Efficient Set Reconciliation algorithm #3934

pmnoxx opened this issue Feb 9, 2021 · 12 comments
Assignees
Labels
A-chain Area: Chain, client & related A-network Area: Network T-core Team: issues relevant to the core team

Comments

@pmnoxx
Copy link
Contributor

pmnoxx commented Feb 9, 2021

We should implement and test a few different approaches to Set Reconciliation problem.

White paper provided by @abacabadabacaba

https://www.ics.uci.edu/~eppstein/pubs/EppGooUye-SIGCOMM-11.pdf

I found one existing rust library for Set Reconciliation : https://crates.io/crates/minisketch-rs
You can find explanation how it works here https://github.com/sipa/minisketch/blob/de98387a83ccd5dd7399333ddcf52e1d4f344e53/README.md

  • Benchmark minisketch-rs
  • Implement IBF in Rust
  • Benchmark IBF
  • Optimize IBF - maybe faster hash function
  • Add implementation of algorithm to estimate size of set difference using IBF
  • Test estimator
  • rename sketch to IBF?
  • Benchmark estimator
  • make into template to support both u64, u32
  • estimate how good estimator is
  • choose proper coefficient for IBF (using 1.5x right now)
  • Can we make estimator secure? yes
  • Choose number of hashes 3 or 4 (using 3)
  • What if we don’t use estimator, but so log n round trips, each time sending bigger ibf
  • Write a doc describing algorithm which is resistant to malicious attacks 2 ver
  • implement algorithm for syncing - example
  • add unit test for it
  • Can we write an algorithm better than one sending 120 bytes *d metadata in worst case?
  • Find more research papers / algorithm names
  • Read https://www.ics.uci.edu/~eppstein/pubs/EppGooUye-SIGCOMM-11.pdf
  • Add hash seed to IBF/seed check
  • read how IBLT works
  • implement Mock graph implementation
  • implement graphsync structure
  • Add tests for it
  • Add benchmarks
  • Investigate Cuckoo Filter: Practically Better Than Bloom
  • write a proposal using https://github.com/near/NEPs/blob/master/0000-template.md

known algorithms

performance
inserting 1m elements (3 hashes) - 60ms
recover 10k elements - 0.9ms
recover 100k elements - 10.5ms
recover 1m elements - 195ms

@pmnoxx pmnoxx added the A-chain Area: Chain, client & related label Feb 9, 2021
@pmnoxx pmnoxx self-assigned this Feb 9, 2021
@pmnoxx
Copy link
Contributor Author

pmnoxx commented Feb 9, 2021

@bowenwang1996

@pmnoxx pmnoxx added the A-network Area: Network label Feb 9, 2021
@pmnoxx
Copy link
Contributor Author

pmnoxx commented Feb 9, 2021

I tested minisketch it works, but we won't be able to use it.
They use a variant, which produces a sketch which supports an add operation, but it doesn't support remove operation. This would require to completely recompute sketches on edge removal, so this is not usable for us.
https://github.com/eupn/minisketch-rs/blob/master/examples/simple.rs

@abacabadabacaba
Copy link
Collaborator

It was me who provided that white paper, not @evgenykuzyakov.

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Feb 9, 2021

@abacabadabacaba Sorry, fixed.

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Feb 9, 2021

I did some early benchmarking
creating sketch with 1m(32bits) elements using minisketch-rs - takes around 2s
my unoptimized code - IBF 1m (64bits) elements - 72ms
In the white paper they claim it takes 4s to insert 1m elements

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Feb 9, 2021

I tested the time it takes to recover the difference, assuming we use 4 hashes.
10k elements - 1.1ms
100k elements - 22ms
1m elements - 284ms

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Feb 9, 2021

That should be fine, we need to spend 2ms to verify each edge, for 1m edges that would take 30m.

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Feb 9, 2021

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Feb 10, 2021

@abacabadabacaba You had some ideas during the meeting on how to improve BLT. Would you mind writing them down here?

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Feb 11, 2021

I implemented full algorithm for syncing. You can see how it works in a unit test: https://github.com/pmnoxx/near-set-reconciliation/blob/21bcdfee1a1c30e11f8df729fc49ebcc7f0af5cf/sr-playground/src/blt_graph.rs#L99
I'll write a document with explanation.

@bowenwang1996 bowenwang1996 added the T-core Team: issues relevant to the core team label Jun 28, 2021
@stale
Copy link

stale bot commented Sep 27, 2021

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

@stale stale bot added the S-stale label Sep 27, 2021
@bowenwang1996
Copy link
Collaborator

#4112 is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-chain Area: Chain, client & related A-network Area: Network T-core Team: issues relevant to the core team
Projects
None yet
Development

No branches or pull requests

3 participants