Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify client codec #1105

Closed
wants to merge 16 commits into from
Closed

Conversation

litt3
Copy link
Contributor

@litt3 litt3 commented Jan 14, 2025

Why are these changes needed?

  • The combination of blob encoding + IFFT/FFT yields a confusing result
  • This PR splits the fundamentally unrelated concepts apart: all payloads must be encoded. After that, we decide based on configuration whether the data must be IFFTed or not

Checks

  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • I've checked the new test coverage and the coverage percentage didn't drop.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

litt3 added 12 commits January 14, 2025 10:30
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
@litt3 litt3 self-assigned this Jan 14, 2025
@litt3 litt3 requested a review from samlaf January 14, 2025 17:29
@litt3 litt3 requested a review from cody-littley January 14, 2025 20:48
@litt3 litt3 marked this pull request as ready for review January 14, 2025 22:01
@litt3 litt3 requested review from ian-shim and bxue-l2 January 14, 2025 22:01
Copy link
Contributor

@samlaf samlaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall big big fan of these changes. Much needed. I personally think we can lean in even higher on the type system and define types for EncodedPayload and Blob. I really don't like looking at a bunch of functions that take a []byte and return another []byte. Wdyt?

api/clients/codecs/blob_codec.go Outdated Show resolved Hide resolved
Comment on lines 21 to 22
// The returned bytes may be interpreted as a polynomial in Eval form, where each contained field element of
// length 32 represents the evaluation of the polynomial at an expanded root of unity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could also be interpreted in coeff form right? Like we had in the optimal disperse route at the top of
image

Also is it really at expanded roots of unity? This part I'm not sure. cc @bxue-l2

Copy link
Contributor Author

@litt3 litt3 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could also be interpreted in coeff form right

This is a tricky question, let me explain my reasoning for wording it this way.

From the perspective of the client, blob polynomials may either be dispersed in Eval form, or Coeff form. Regardless of the form the client chooses, however, the disperser chooses to interpret what it receives as a polynomial in Coeff form.

client: "EncodePayload returns a polynomial in Eval form. Depending on my configuration, I may choose to directly disperse the blob in Eval form, or I may convert the polynomial to Coeff form" first.

disperser: "No matter what the client sends me, I will interpret it as Coeff form"


One argument for this set of definitions is that it yields meaningful names for the enum which determines pre-dispersal processing: Eval means don't IFFT, and Coeff means do IFFT. If we were to define EncodePayload, such that what it returns is considered to be Coeff form in the case of "optimal dispersal", that means the polynomial sent from the client to the disperser is always in Coeff form. In this case, we would need to think of alternate names for the enum values.

Essentially, the question boils down to this: in the alternate "optimal" dispersal pathway, at what point in time do we declare that the non-iffted bytes are in Coeff form? Does the client make this declaration, or does the disperser? I chose for the disperser to make this declaration in my descriptions, but I think it's a pretty arbitrary decision. We just need to come to consensus, and stick to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is it really at expanded roots of unity?

I think so, but confidence is lacking. Waiting for clarification from @bxue-l2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome summary, let's definitely chat about this over standup with everyone, there's a lot going on which I need to think about more deeply.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the roots_of_unity, having fought in our rust PRs over the meaning of primitive_roots_of_unity, roots_of_unity, expanded_roots_of_unity, I think the semantics is something like:

  1. We need roots of unity that form a subgroup of the order equal to the number of FEs in our polynomial/blob
  2. We store the 29 primitive_roots_of_unity as const in the code, which, when expanded, each generate a group of order 2^index of that PROU (2^0, 2^1, 2^2, etc.)

So if my current understand is correct (cc @bxue-l2 @anupsv), then we should not refer to primitive/expanded in these kinds of docs, just roots_of_unity. expanded just refers to the procedure we have to take because of the form we decided to cache the roots in. We could also just have stored every single root of unity for every power directly in the binary and used those directly without needing to expand a primitive root.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is right.

Having roots_of_unity and expanded_roots_of_unity are sufficient.

See wiki.
which are 1, 2^(2^1), 2^(2^2), 2^(2^3)....2^(2^28). The exponent is power of 2 because it needs to work with FFT

Expanded ROU based on 2^(2^1) is [1, w, w^2, w^3]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a slight different idea on the following statement

EncodePayload returns a polynomial in Eval form

The client can decide which form it wants the EncodedPayload to be interpreted. If CoeffForm, then the EncodedPayload will go to the optimal path, otherwise in the EvalForm, the EncodedPayload needs to undergo IFFT

Comment on lines 24 to 26
// The returned bytes may or may not represent a blob. If the system is configured to distribute blobs in Coeff form,
// then the data returned from this function must be IFFTed to produce the final blob. If the system is configured to
// distribute blobs in Eval form, then the data returned from this function is the final blob representation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to be using blob here, then we have to be 100% precise and link to its definition somewhere in the codebase.

Also what does "system is configured mean"? Isn't the system eigenda? And eigenda always interprets blobs as being in coeff form?

Copy link
Contributor Author

@litt3 litt3 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we have to be 100% precise and link to its definition somewhere in the codebase

Do we have any precise definitions of a blob in the codebase yet?

system is configured mean

I intended this to mean "however the various clients are configured to disperse and retrieve blobs". Do you think it is better if I say:

"If clients are configured to distribute blobs in Coeff form,..."?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think just using "clients are configured" is cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not have precise definition of blob in the codebase. Ideally we'll have a spec soon enough that we can point to though perhaps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My definition of a blob is the non blobHeader part that is received by the disperser. If the length of the received part is not power of 2, we artificially pad it so. So blob is always power of 2

api/clients/codecs/blob_codec.go Outdated Show resolved Hide resolved
api/clients/codecs/blob_codec.go Outdated Show resolved Hide resolved
Comment on lines 15 to 16
// Number of test iterations
iterations := testRandom.Intn(100) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit weird. Why not just always test the same number of iterations?

Comment on lines 11 to 12
// TestFFT checks that data can be IFFTed and FFTed repeatedly, always getting back the original data
func TestFFT(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TestFFT checks that data can be IFFTed and FFTed repeatedly, always getting back the original data
func TestFFT(t *testing.T) {
// TestFFT checks that data can be IFFTed and FFTed repeatedly, always getting back the original data
// TODO: we should probably be using fuzzing instead of this kind of ad-hoc random search testing
func TestFFT(t *testing.T) {

}

var codec codecs.BlobCodec
var polynomialForm codecs.PolynomialForm
if config.DisablePointVerificationMode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good time to update this weirdly named flag to something more explicit like the polynomial form being passed directly?

Feel like the fact that coeff form has point verification feature is VERY context dependent on understanding the way the eigenda backend is doing things etc. This should prob just be documented in the ENUM and flag, and then users can choose which form they want to use accordingly?

Comment on lines 161 to 169
encodedPayload, err := m.blobToEncodedPayload(data)
if err != nil {
return nil, fmt.Errorf("error decoding blob: %w", err)
return nil, fmt.Errorf("blob to encoded payload: %w", err)
}

return decodedData, nil
decodedPayload, err := codecs.DecodePayload(encodedPayload)
if err != nil {
return nil, fmt.Errorf("error decoding payload: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clean! We might even want a single function like blobToPayload which does both operations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also might want to use blob instead of data for the thing received by RetrieveBlob?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a blob always power of 2?

Comment on lines +416 to +417
// If the system is configured to distribute blobs in Eval form, the returned bytes exactly match the blob bytes.
// If the system is configured to distribute blobs in Coeff form, the blob is FFTed before being returned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If the system is configured to distribute blobs in Eval form, the returned bytes exactly match the blob bytes.
// If the system is configured to distribute blobs in Coeff form, the blob is FFTed before being returned
// If the EigenDAClient is configured to distribute blobs in Eval form, the returned bytes exactly match the blob bytes.
// If the EigenDAClient is configured to distribute blobs in Coeff form, the blob is FFTed before being returned

litt3 added 3 commits January 15, 2025 10:19
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
Signed-off-by: litt3 <102969658+litt3@users.noreply.github.com>
@litt3 litt3 marked this pull request as draft January 16, 2025 14:00
@litt3 litt3 closed this Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants