Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce congestion via Aggregating ProveCommits via Inner Product Pairing #50

Closed
nicola opened this issue Dec 15, 2020 · 10 comments
Closed
Labels
FIP0013 Links an existing discussion item to an existing FIP.

Comments

@nicola
Copy link
Contributor

nicola commented Dec 15, 2020

Problem

(same as #49 )

ProveCommits and PreCommits are creating network congestion leading to high base fee, leading to high cost for SubmitWindowPoSt and PublishStorageDeal.

(however, differently from #49)

ProveCommits messages and gas used scale linearly with network growth.

Proposed solution

(similar to #49)

Processing multiple ProveCommits at the same time can drastically reduce the gas used per sector, leading to a much lower gas usage from ProveCommits. We propose to add a new method: "ProveCommitAggregated".

(differently from #49)

The ProveCommitAggregated method allows for gas used in the network to scale sub-linearly with the growth of the network. A miner can submit a single short proof for many ProveCommits and the gas used for verification is sublinear in the ProveCommits aggregated. In other words, miners could be making a single ProveCommitAggregated transaction per day, instead of one per sector or one per batch of sectors.

There are ~6M individual SNARK are published on chain per day, largest miner publishes ~600k of them. With this solution - at least in theory - we would need a single SNARK per miner ~700. If we assume that miners only batch every 1,000 proofs on average, the SNARKs published per day would be ~6,000.

There are two parts that can be amortized:

  • State operations: several state reads and writes can be batched - done once per ProveCommitBatched instead of once per sector (similar improvements done in Add a batched PreCommitSectors method #25 )
  • Aggregation: we currently batch verify 10 SNARKs in a single ProveCommit, in this proposal we propose that the miner aggregates a large amount of ProveCommits in a single message ProveCommitAggregation message.

Context on Groth16 aggregation

The Protocol Labs team in collaboration with external researchers and engineers has improved the performances of the rust implementation of the IPP protocol (see ripp). The inner product pairing protocol for aggregating groth16 proofs has been described in this paper.

In high level, the idea is the following: given X Groth16 proofs, we can generate a single proof that these were correctly aggregated in a single proof.

Our preliminary result show that a prover can aggregate up to ~65,000 SNARKs (6,500 ProveCommits) in a size of ~78kB and with verification time of ~150ms.

Note the this type of aggregation sits on top of the existing SNARKs that we do. In other words, there is no need for a new trusted setup.

Comparison with #49 (Batching ProveCommits)

The size of an aggregated proof grows logarithmically in the numbers of proofs to aggregate, differently with batching, where the proof size scales linearly.

In other words, proposal #49 ProveCommitBatched has a limitation in the numbers of proofs to be aggregated, while ProveCommitAggregatedIPP does not. This opens up the possibility for miners to submit a daily single proof of new storage being added.

Outline

  • Implement and audit the IPP over groth16
  • Implement ProveCommitAggregatedIPP that allow miners to submit a single proof for multiple PoReps
  • Disable ProveCommit

With this mechanism, miners should always prefer to aggregate multiple proofs together since they would be substantially reduce their costs.

Discussion

This issue is intended for discussion. There are a number of details to work out before drafting a FIP.

Aggregation parameters

This is a test that aggregates 65536 SNARKs (~6,500 poreps) with a proof size of 78kB and verification time of 197ms

Proof aggregation finished in 41685ms
65536 proofs, each having 328 public inputs...
Verification aggregated finished in 197ms (Proof Size: 78336 bytes)

Open questions:

  • What is the range of possible optimizations in state operations and how much are we expecting to get?
  • What is the expected saving for such a change?

Note there is a separate line of investigation to propose ProveCommitAggregated with Halo instead of IPP, however it seems that IPP would be ready to be used faster than Halo.

@nicola
Copy link
Contributor Author

nicola commented Dec 15, 2020

Note that this should be done in conjunction with #25

@nicola
Copy link
Contributor Author

nicola commented Feb 10, 2021

Update:

  • We should keep on supporting existing ProveCommit and add ProveCommitAggregated and full nodes will decide when to use it, we will make clear recommendations
  • We should remove the parallel "batch" verification for ProveCommitBatched

@nicola
Copy link
Contributor Author

nicola commented Feb 17, 2021

Update:

  • We have run initial calculations from our on-going IPP aggregation implementation that shows a potential large gas saving for large proof aggregations (see ProveCommitAggregate notebook)
  • We are soon to release a document describing the constructions and the proof for the community to review
  • The next step is to trying to understand if we can save gas by aggregating operations

Reach out if you are interested in helping!

@nicola
Copy link
Contributor Author

nicola commented Feb 25, 2021

Update:

Questions to answer:

  • Proof versioning (cc @porcuquine @dignifiedquire )
    • Should ProveCommitSectorAggregate be available to existing actors? Yes => will have to mutate SectorPreCommitOnChainInfo which will need more investigation.
    • How should we do this? we could introduce an aggregation proof type tied to ProveCommitAggregated invocations and not PreCommitInfos. Miners would pass this proof type in the parameters to ProveCommitAggregated. The syscall -> ffi would then be passed 1. seal proof type describing the underlying sectors as happens now 2. aggregation proof type allowing for versioning of the aggregation construction. This decoupling would solve the problem listed above and avoid a combinatorial explosion of proof types (32gib-agg1, 32gb-agg2, 2kb-agg3, 32gb-agg1, 64gb-agg2...)

@ZenGround0
Copy link
Contributor

For the aggregation versioning question I am currently convinced that the correct thing to do is the solution described in the second paragraph: introduce an independent abi.RegisteredAggregationProof type and include this as an argument to ProveCommitAggregated.

@ZenGround0
Copy link
Contributor

We've used the specs-actors prototype in 1381 to compare gas usage between the current prove commit entrypoint and the proposed aggregate entrypoint. Reposting this summary from slack:

PR 1381 contains two tests which measure the gas cost of prove committing x sectors using the current strategy (x invocations of ProveCommit) and the new prototype ProveCommitAggregate. The highlights from the results:

  • The minimum number of poreps needed for aggregation to break even on miner gas cost is between 6 and 13
  • Miners spend about 2.5x less gas per sector when aggregating in batches of 26, 10x less gas for batches of 200, 18x less gas for batches of 800
  • Aggregating 819 sectors costs ~20% of the block gas limit. Extrapolating trends, aggregating 3000 sectors should cost about 50% of the block gas limit which can serve as a safe upper bound on aggregate batch size

Biggest risks:

  • I haven't been including the message size overhead, only expected proof bytes which slightly counts against aggregate porep advantage
  • The measurements assume contiguous bitfields which is not always what miners opt into meaning that message sizes are going to be bigger in practice also counting against the porep advantage.
  • The measurements all use an empty miner actor. IPLD costs will increase with large sectors AMTs and other miner state. I plan to measure inserts against miners with many existing prove commits as the next step.
  • Mainnet commonly sees ~50M gas expenditures in ConfirmSectorsProofsValid whereas these measurements show <20M at mainnet batch sizes. This is potentially because of the above point.
  • Prototype code is not thoroughly vetted so security / correctness issues could increase gas usage. For example we may need to call into power actor to check claim existence. However all known/expected changes are anticipated to cost small amounts of gas

Biggest opportunities:

  • The prototype code is not heavily optimized. One straightforward optimization is to batch calls to market.ComputeDataCommittment. This method is a significant bottleneck especially as batch sizes rise (1 billion gas in 819 aggregate ipld+call gas). Batching calls alone would save significant gas (23M in 819 aggregate case). Market datastructure batch loading will probably also give significant savings. Finally we can do things to optimize the common case of cc sectors up to adding a single syscall inside ProveCommitAggregate without going to market actor at all to get all cc commDs in on syscall.
  • To protect against noise in the aggregate porep during measurements I added ~5ms to aggregation verification time. This means the actual advantage is a bit better than the data is showing if the porep verification times in the observable are accurate.

Aggregate PoRep vs Current Porep - Sheet1.pdf

@nikkolasg
Copy link
Contributor

I have changed the benchmark so it repeats a few times the operations to smoothen out the results. Here is the resulting CSV file.
TLDR these results makes it a bit faster than previous results, also thanks to the latest optimization, and it shows a higher difference versus "multiple batching of batches of 10 proofs" which is current Lotus behavior. So we should expect higher reductions in gas from these numbers.
Note still that these are ran on a 32c/64t ThreadRipper machine so parallelism is playing big here.

nproofs,aggregate_create_ms,aggregate_verify_ms,batch_verify_ms,batch_all_ms,aggregate_size_bytes,batch_size_bytes
8,82,14,6,5,21328,1536
16,123,15,6,5,27184,3072
32,197,15,9,8,33040,6144
64,317,15,12,12,38896,12288
128,493,18,17,19,44752,24576
256,840,21,27,32,50608,49152
512,1292,21,50,46,56464,98304
1024,2362,23,87,77,62320,196608
2048,4309,28,165,147,68176,393216
4096,8020,28,321,248,74032,786432
8192,15395,31,656,491,79888,1572864

@nicola
Copy link
Contributor Author

nicola commented Mar 2, 2021

Updated the benchmark notebook: https://observablehq.com/@protocol/provecommitaggregate-notebook, hopefully this should give much better results!

@Pegasus-starry
Copy link

Pegasus-starry commented Jul 2, 2021

Hi , I have met the panic problem about commit aggregate failed, who can help me to solve it? Thanks.

{“level”:“warn”,“ts”:“2021-07-02T12:08:48.101+0800",“logger”:“sectors”,“caller”:“storage-sealing/fsm.go:627",“msg”:“sector 2393 got error event sealing.SectorCommitFailed: aggregate error: aggregating proofs: Rust panic: Once instance has previously been poisoned”}

{"level":"warn","ts":"2021-07-02T12:04:52.703+0800","logger":"sectors","caller":"storage-sealing/fsm.go:627","msg":"sector 2396 got error event sealing.SectorCommitFailed: aggregate error: aggregating proofs: Rust panic: no unwind information"}

@kaitlin-beegle
Copy link
Contributor

Marking as inactive due to the passage of FIP-0013

@kaitlin-beegle kaitlin-beegle added Technical - Actors FIP0013 Links an existing discussion item to an existing FIP. labels Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FIP0013 Links an existing discussion item to an existing FIP.
Projects
None yet
Development

No branches or pull requests

5 participants