Scaling FIL+ Allocation Mechanism #708

nikkolasg · 2023-05-18T11:51:59Z

nikkolasg
May 18, 2023

TLDR: This new mechanism could reduce between 34% and 50% of the gas cost associated to the claiming allocation process.

Problem

The FIL+ allocation strategy is slow, costly and does not scale onchain.

Let’s think as the datacap as a non-transferable token for simplicity. Currently, on a high level the FIL+ allocation looks like the following:

Notary “mints” tokens to the client
Client “burns” tokens to create an allocation (on-chain)
The SP claims the allocation (which in turns, creates a “claim”) - on-chain

Each allocation is for only one sector at a time.

Note that all of these operations can be done in batch, but there is still per-sector overhead for each allocation and claim.

Goal

We want to change the process so that clients can create a single allocation for data that will span multiple sectors.

The primary goal is to reduce the workload/cost on the client (the second step mentioned in the above section). This proposal also enables to reduce the workload/cost on the SP as well by claiming multiple allocations at once but it is not required.

Proposal

We propose the client use a commitment to represent a set of Piece CIDs (CommPs) that the client wishes to put on Filecoin.

The client can then do a single allocation for this set of CommPs.

The SP can claim set of commPs either incrementally (one by one) or by batches.

A proposed workflow for the allocation is as follow:

The client computes all the CommPs they wish to store
The client computes a commitment to these CommPs, let’s call it C
The client sends all the CommPs to the SP
The client creates an FIL+ allocation using the commitment C
SP checks the commitment C from all the commPs, and checks against the FIL+ allocation’s commitment

Once that is done, the claims process works in the following way:

SP creates an opening proof for each CommP they will seal into a sector. The proof is with respect to the commitment C.
1. Note that the SP can create an opening proof for multiple commP at once !
SP submits the opening proof onchain when proving a sector. The opening proof is checked against C.

Commitment

We propose to use a simple commitment called KZG commitment. It is an algebraic commitment enabling to create a constant size proof of opening (80B).

These commitments are becoming more and more used, and library support is already present. For example, danksharding or the new state trie in Eth2 will be using KZG commitments all over the place.

Why KZG commitment ?

Constant size proofs is one major advantage, regardless of the number of allocation the SP is claiming, because it supports batch opening.

This is not the case with Merkle Tree for example.

Another reason is that having algebraic commitment in Filecoin opens the door to much better scalability and interoperability perspective. In particular, these commitment are cheaply aggregatable and verifiable.
Why not using a Merkle Tree?

A Merkle Tree would be cheaper on execution from the client perspective but it will be using more non indexed storage. In other word, the gas usage will be higher.

Gas Cost Reduction with current method

The following graph shows the reduction in gas cost between the current method and what it could be using the KZG batched method (raw spreadsheet)

Pessimistic Estimation is the blue line: It doesn't take into account that we could reduce other costs as part of the claiming process when there is only one allocation.
Optimistic Estimation is the red line: It assumes we can at least remove 15M gas from the claiming process when there is only one allocation.

One concrete number: between 34% and 50% gas reduction when doing 64 claims at once

Commitment Comparison:

This table shows an example for an opening proof representing 64 commP between using a Merkle Tree and a KZG commitment scheme:

	Merkle Tree	KZG
Client Commitment Cost	2048 hash inv. (<< 1ms)	1 MSM (size 1024) in 1ms (w/ blst)
Onchain Commitment Size	320B	48B
Onchain Proof Size (non indexed)	20 KB	48B
Onchain Proof Verification	64 * 10 hash calls	1 pairing check + 1 MSM size 64 + O(64) field ops → less than 1ms

As you can see, it’s a trade-off between onchain footprint and verification cost. We believe that for small sizes like this, onchain verification cost is still reasonable and bring the best trade-offs.

Gas Comparison This is a rough estimations of gas cost for both methods (Merkle Tree vs KZG) according to the number of openings one is doing (spreadsheet):

KZG is cheaper onchain as soon as one makes 20-32 batch openings.
This estimation is taking a pessismitic number for KZG. For example, the gas for KZG should hover around 10M, but this estimation takes it to 12M to account for unexpected costs.

Assumptions taken to compute the cost:

* Overall cost for syscall - 10gas/ns is the translation factor - 20k overhead per syscall - 0.6gas per byte - Overall cost for storage - 1300 gas / byte (non indexed storage), 3400 / byte indexed - KZG cost (syscall): approx 10M - pairing 1ms computation → 10M gas - scalar mul 373microseconds → 373k gas - KZG takes 48 bytes proofs + indices + values - X * 32 * 2 * 0.6, 2k for 64 opening - KZG Cost storage: 48B - Merkle Tree cost for 1 opening: - 320B * 1300 = 416k - Merkle Tree cost for 64 opening: - 320*64*1300 = 26M - NOT taking into account the storage of the actual comP values and indices, since that is the same in both case

Implementation

A KZG commitment would require a precompile to cheaply verify it onchain. This precompile would use blst as the backend which already provides many of the building block to verify kzg commitment (+ many other libraries building on DankSharding on Eth2 are popping up with their KZG implementation). More work is needed to explore the space of libraries and decide if we want to integrate or use our own implementation (a whole impl. fits in < 500 lines).

Note that we could also decide to use another curve, bn254, which is faster than bls12-381 and more importantly, is available on any EVM chains ! However, this curve have less implementations out there.

Discussion

Checking commitment between client and SP

There is an alterntive process we describe here for completeness which essentially make sure the client and SP agree on C offchain first.

Client computes the commitment C
Client sends all the CommPs to the SP
The SP computes the commitment C locally
They make sure the Commitment is the same
Client creates an FIL+ allocation using the commitment C

Because the client and SP have an incentive to work together, it might be better to make them check the commitment C off-chain first. This is up for discussion.

Interoperability with other chains

Because this proposal does not use any specific encoding (IPLD etc), a client can create his commitment on another chain or IPC subnets, as long as the precompile is enabled there.

Given we are “creating a new state” in some sort, the ability to use KZG commitment scheme (with known curves) would allow us to use it and refer to it in a different context than Filecoin. For example, it would be “possible” to create these commitments on any Ethereum-related chains (when this EIP is included or if we use BN254, already available today).

Different Poreps

In the case Porep changes, this proposal would still be compatible as it does not interpret the commP in anyway (i.e. we could change it to using blake3 for example).

If the size of the sector changes, for example with smaller sector sizes, this proposal can work as well on a larger set of commPs (e.g. pass from 1024 commP to 2048 smaller commP). In sector sizes grow larger, this proposal would work on a smaller set of commPs (e.g. pass from 1024 to 512 set size of commP).

Thanks to @Kubuxu for the initial idea and long discussions and @anorth for feedback and reviews!.

anorth · 2023-05-23T01:23:32Z

anorth
May 23, 2023
Maintainer

Thanks @nikkolasg this sounds great.

I think the changes to the verified registry actor to support this would be fairly straightforward. There are some policy decisions to work out about how to treat partial claims, loss of one part of a claim etc, but I don't think they need be too complicated. A sketch of these is in this earlier design for FIP-0045.

This will all work great from the verified registry side. Actually realising it is blocked by the built-in storage market actor's privileged position as mediator of all sector content. At present, everything to do with sector content must go through the built-in market actor, which is not prepared to handle deals that are larger than a sector. My opinion is that the built-in storage market actor should be removed from this privileged position, and the associated API changes would then open this scalable FIL+ allocation mechanism directly to clients and SPs, or to user-programmed contracts as mediators. See #298, "Free CommD", and this glimpse of miner protocol change dependencies.

0 replies

nikkolasg · 2023-05-23T11:53:08Z

nikkolasg
May 23, 2023
Author

Update: added estimated graph cost comparison

0 replies

dkkapur · 2023-05-24T07:30:16Z

dkkapur
May 24, 2023

@nikkolasg - thanks for putting this together. there's definitely concern right now on the costs of announcing/publishing verified deals to the network, and with FIP 0045 already implemented, this should reduce some of the cost.

are you able to join us at the next Fil+ Notary Governance call to present this to the Fil+ community on Jun 6 1500 UTC? filecoin-project/notary-governance#885 for details, cc @Kevin-FF-USA as FYI

1 reply

nikkolasg May 24, 2023
Author

Yes I can; I think it would make sense as well to bring @Kubuxu and/or @anorth in the call as well since they would have a lot of context that I don't.
However, having a calendar invite would help and I don't see any meetings planned on the 6th in the shared calendar linked in the issue you posted.
Is it expected ?

zixuanzh · 2023-05-24T08:34:23Z

zixuanzh
May 24, 2023
Collaborator

Would "anyone" be able to associate C with the content of the data in a way that CID can? Is it relatively straightforward to map C to CID of the content?

4 replies

nikkolasg May 24, 2023
Author

Well, the proof will be stored onchain (non indexed, so not in the state) so once that is verified onchain, then anyone can know that C corresponds to a given set of commP. You could also access this information offchain if you wanted to, although it would require a bit more careful design (about trusting C namely).
@anorth had an alternative version of this proposal with different pros/cons that would allow to have a C that would simply be a commP for larger or smaller sectors. I believe his notes about this are written in there. On the pros it would allow considering FIL+ for any sizes in theory. However this would still be using Merkle Trees and would not reduce cost as much as the proposal described above (as far as I know / understood).

anorth May 24, 2023
Maintainer

The commitment C is not transparent. Given only C, one cannot work out what the corresponding piece CIDs are. But given knowledge of the piece CIDs (which will appear in chain history, but not chain state), one can prove that C commits to them. So "anyone" off-chain can make the association via the history. A contract on-chain must be provided the CIDs but can then verify they are correct.

steven004 May 25, 2023
Collaborator

I believe this is a great question. When we refer to the content CID, it is preferable to consider it as the CID of the payload of a piece, rather than the pieceID(commP). From an end user's perspective, only the content itself is concerned. One may simply download content by specifying the CID from IPFS or Saturn, this CID, based on hash function, does not appear in the chain's history.

With this CID concept in mind, it may not be possible to establish an association between the content CID and C, though it is possible to establish the association between C and commPs.

It would be very beneficial to establish an association between hash-based CID and C (or commP) in the chain's state. This would enable end users to verify that the content they download from any storage provider, even on a platform other than Filecoin, is exactly the one they intended, using the Filecoin's chain state. This is the fundamental function of the chain, which is to verify data cryptographically.

However, it may not be feasible to provide a proof of association between the hash-type CID and the Markle-proof commP based on my understanding. It is worth exploring alternative methods, particularly if a different zero-knowledge proof algorithm is employed, to achieve this objective. a dumb question, will KZG commitment help on this?

anorth Jun 2, 2023
Maintainer

Thanks @steven004 I may have misinterpreted the original question. Both the PieceCID (CommP) and "IPFS CID" of the underlying content are CIDs, and @zixuanzh probably meant the latter.

It is already the case that we can't straightforwardly map a PieceCID to IPFS CID. That data must be indexed elsewhere. This proposal won't change that: we'd have Commitment C to a set of PieceCIDs, but also no direct mapping to IPFS CIDs.

Yes, it would be very desirable to have such a direct mapping, and we don't know how. Cryptonet have explored a couple of approaches, but no great results yet:

Do sector sealing over IPFS CIDs directly, without linearising into a binary merkle tree (abolish PieceCID/CommP/CommD)
Some provable translation between CID schemes
Neither appear immediately tractable, but investigation is ongoing and we'd love to hear of concrete ideas (beyond the scope of this proposal)

nikkolasg · 2023-06-01T16:40:41Z

nikkolasg
Jun 1, 2023
Author

Updated discussion with raw gas cost improvements for the claiming allocation process.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling FIL+ Allocation Mechanism #708

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Scaling FIL+ Allocation Mechanism #708

nikkolasg May 18, 2023

Problem

Goal

Proposal

Commitment

Gas Cost Reduction with current method

Commitment Comparison:

Implementation

Discussion

Checking commitment between client and SP

Interoperability with other chains

Different Poreps

Replies: 5 comments · 5 replies

anorth May 23, 2023 Maintainer

nikkolasg May 23, 2023 Author

dkkapur May 24, 2023

nikkolasg May 24, 2023 Author

zixuanzh May 24, 2023 Collaborator

nikkolasg May 24, 2023 Author

anorth May 24, 2023 Maintainer

steven004 May 25, 2023 Collaborator

anorth Jun 2, 2023 Maintainer

nikkolasg Jun 1, 2023 Author

nikkolasg
May 18, 2023

Replies: 5 comments 5 replies

anorth
May 23, 2023
Maintainer

nikkolasg
May 23, 2023
Author

dkkapur
May 24, 2023

nikkolasg May 24, 2023
Author

zixuanzh
May 24, 2023
Collaborator

nikkolasg May 24, 2023
Author

anorth May 24, 2023
Maintainer

steven004 May 25, 2023
Collaborator

anorth Jun 2, 2023
Maintainer

nikkolasg
Jun 1, 2023
Author