Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify client codec #1105

Closed
wants to merge 16 commits into from
77 changes: 51 additions & 26 deletions api/clients/codecs/blob_codec.go
Original file line number Diff line number Diff line change
@@ -1,48 +1,73 @@
package codecs

import (
"bytes"
"encoding/binary"
"fmt"

"github.com/Layr-Labs/eigenda/encoding/utils/codec"
)

type BlobEncodingVersion byte

const (
// This minimal blob encoding contains a 32 byte header = [0x00, version byte, uint32 len of data, 0x00, 0x00,...]
// DefaultBlobEncoding entails a 32 byte header = [0x00, version byte, uint32 len of data, 0x00, 0x00,...]
samlaf marked this conversation as resolved.
Show resolved Hide resolved
// followed by the encoded data [0x00, 31 bytes of data, 0x00, 31 bytes of data,...]
DefaultBlobEncoding BlobEncodingVersion = 0x0
samlaf marked this conversation as resolved.
Show resolved Hide resolved
)

type BlobCodec interface {
DecodeBlob(encodedData []byte) ([]byte, error)
EncodeBlob(rawData []byte) ([]byte, error)
}
// EncodePayload accepts an arbitrary payload byte array, and encodes it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be super precise in what "encode" means. What's the length of the encoded thing? Is it multiple of 32, or can the last element not be? etc

//
// The returned bytes may be interpreted as a polynomial in Eval form, where each contained field element of
// length 32 represents the evaluation of the polynomial at an expanded root of unity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could also be interpreted in coeff form right? Like we had in the optimal disperse route at the top of
image

Also is it really at expanded roots of unity? This part I'm not sure. cc @bxue-l2

Copy link
Contributor Author

@litt3 litt3 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could also be interpreted in coeff form right

This is a tricky question, let me explain my reasoning for wording it this way.

From the perspective of the client, blob polynomials may either be dispersed in Eval form, or Coeff form. Regardless of the form the client chooses, however, the disperser chooses to interpret what it receives as a polynomial in Coeff form.

client: "EncodePayload returns a polynomial in Eval form. Depending on my configuration, I may choose to directly disperse the blob in Eval form, or I may convert the polynomial to Coeff form" first.

disperser: "No matter what the client sends me, I will interpret it as Coeff form"


One argument for this set of definitions is that it yields meaningful names for the enum which determines pre-dispersal processing: Eval means don't IFFT, and Coeff means do IFFT. If we were to define EncodePayload, such that what it returns is considered to be Coeff form in the case of "optimal dispersal", that means the polynomial sent from the client to the disperser is always in Coeff form. In this case, we would need to think of alternate names for the enum values.

Essentially, the question boils down to this: in the alternate "optimal" dispersal pathway, at what point in time do we declare that the non-iffted bytes are in Coeff form? Does the client make this declaration, or does the disperser? I chose for the disperser to make this declaration in my descriptions, but I think it's a pretty arbitrary decision. We just need to come to consensus, and stick to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is it really at expanded roots of unity?

I think so, but confidence is lacking. Waiting for clarification from @bxue-l2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome summary, let's definitely chat about this over standup with everyone, there's a lot going on which I need to think about more deeply.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the roots_of_unity, having fought in our rust PRs over the meaning of primitive_roots_of_unity, roots_of_unity, expanded_roots_of_unity, I think the semantics is something like:

  1. We need roots of unity that form a subgroup of the order equal to the number of FEs in our polynomial/blob
  2. We store the 29 primitive_roots_of_unity as const in the code, which, when expanded, each generate a group of order 2^index of that PROU (2^0, 2^1, 2^2, etc.)

So if my current understand is correct (cc @bxue-l2 @anupsv), then we should not refer to primitive/expanded in these kinds of docs, just roots_of_unity. expanded just refers to the procedure we have to take because of the form we decided to cache the roots in. We could also just have stored every single root of unity for every power directly in the binary and used those directly without needing to expand a primitive root.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is right.

Having roots_of_unity and expanded_roots_of_unity are sufficient.

See wiki.
which are 1, 2^(2^1), 2^(2^2), 2^(2^3)....2^(2^28). The exponent is power of 2 because it needs to work with FFT

Expanded ROU based on 2^(2^1) is [1, w, w^2, w^3]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a slight different idea on the following statement

EncodePayload returns a polynomial in Eval form

The client can decide which form it wants the EncodedPayload to be interpreted. If CoeffForm, then the EncodedPayload will go to the optimal path, otherwise in the EvalForm, the EncodedPayload needs to undergo IFFT

//
// The returned bytes may or may not represent a blob. If the system is configured to distribute blobs in Coeff form,
// then the data returned from this function must be IFFTed to produce the final blob. If the system is configured to
// distribute blobs in Eval form, then the data returned from this function is the final blob representation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to be using blob here, then we have to be 100% precise and link to its definition somewhere in the codebase.

Also what does "system is configured mean"? Isn't the system eigenda? And eigenda always interprets blobs as being in coeff form?

Copy link
Contributor Author

@litt3 litt3 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we have to be 100% precise and link to its definition somewhere in the codebase

Do we have any precise definitions of a blob in the codebase yet?

system is configured mean

I intended this to mean "however the various clients are configured to disperse and retrieve blobs". Do you think it is better if I say:

"If clients are configured to distribute blobs in Coeff form,..."?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think just using "clients are configured" is cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not have precise definition of blob in the codebase. Ideally we'll have a spec soon enough that we can point to though perhaps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My definition of a blob is the non blobHeader part that is received by the disperser. If the length of the received part is not power of 2, we artificially pad it so. So blob is always power of 2

func EncodePayload(payload []byte) []byte {
samlaf marked this conversation as resolved.
Show resolved Hide resolved
payloadHeader := make([]byte, 32)
// first byte is always 0 to ensure the payloadHeader is a valid bn254 element
payloadHeader[1] = byte(DefaultBlobEncoding) // encode version byte

func BlobEncodingVersionToCodec(version BlobEncodingVersion) (BlobCodec, error) {
switch version {
case DefaultBlobEncoding:
return DefaultBlobCodec{}, nil
default:
return nil, fmt.Errorf("unsupported blob encoding version: %x", version)
}
// encode payload length as uint32
binary.BigEndian.PutUint32(
payloadHeader[2:6],
uint32(len(payload))) // uint32 should be more than enough to store the length (approx 4gb)

// encode payload modulo bn254
// the resulting bytes subsequently may be treated as the evaluation of a polynomial
polynomialEval := codec.ConvertByPaddingEmptyByte(payload)

encodedPayload := append(payloadHeader, polynomialEval...)

return encodedPayload
}

func GenericDecodeBlob(data []byte) ([]byte, error) {
if len(data) <= 32 {
return nil, fmt.Errorf("data is not of length greater than 32 bytes: %d", len(data))
}
// version byte is stored in [1], because [0] is always 0 to ensure the codecBlobHeader is a valid bn254 element
// see https://github.com/Layr-Labs/eigenda/blob/master/api/clients/codecs/default_blob_codec.go#L21
// TODO: we should prob be working over a struct with methods such as GetBlobEncodingVersion() to prevent index errors
version := BlobEncodingVersion(data[1])
codec, err := BlobEncodingVersionToCodec(version)
if err != nil {
return nil, err
// DecodePayload accepts bytes representing an encoded payload, and returns the decoded payload
//
// This function expects the parameter bytes to be a polynomial in Eval form. In other words, if blobs in the system
// are being distributed in Coeff form, a blob must be FFTed prior to being passed into the function.
samlaf marked this conversation as resolved.
Show resolved Hide resolved
func DecodePayload(encodedPayload []byte) ([]byte, error) {
if len(encodedPayload) < 32 {
return nil, fmt.Errorf("encoded payload does not contain 32 header bytes, meaning it is malformed")
}

data, err = codec.DecodeBlob(data)
payloadLength := binary.BigEndian.Uint32(encodedPayload[2:6])

// decode raw data modulo bn254
nonPaddedData := codec.RemoveEmptyByteFromPaddedBytes(encodedPayload[32:])

reader := bytes.NewReader(nonPaddedData)
payload := make([]byte, payloadLength)
readLength, err := reader.Read(payload)
if err != nil {
return nil, fmt.Errorf("unable to decode blob: %w", err)
return nil, fmt.Errorf(
"failed to copy unpadded data into final buffer, length: %d, bytes read: %d",
payloadLength, readLength)
}
if uint32(readLength) != payloadLength {
return nil, fmt.Errorf("data length does not match length prefix")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if this was in a self-contained function. Maybe we can change RemoveEmptyByteFromPaddedBytes to take a second len argument?

}

return data, nil
return payload, nil
}
49 changes: 4 additions & 45 deletions api/clients/codecs/blob_codec_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,8 @@ func randomByteSlice(length int64) []byte {
return b
}

// TestIFFTCodec tests the encoding and decoding of random byte streams
func TestIFFTCodec(t *testing.T) {
// Create an instance of the DefaultBlobEncodingCodec
codec := codecs.NewIFFTCodec(codecs.NewDefaultBlobCodec())

// TestCodec tests the encoding and decoding of random byte streams
func TestCodec(t *testing.T) {
// Number of test iterations
const iterations = 100

Expand All @@ -36,48 +33,10 @@ func TestIFFTCodec(t *testing.T) {
originalData := randomByteSlice(length.Int64() + 1) // ensure it's not length 0

// Encode the original data
encodedData, err := codec.EncodeBlob(originalData)
if err != nil {
t.Fatalf("Iteration %d: failed to encode blob: %v", i, err)
}

// Decode the encoded data
decodedData, err := codec.DecodeBlob(encodedData)
if err != nil {
t.Fatalf("Iteration %d: failed to decode blob: %v", i, err)
}

// Compare the original data with the decoded data
if !bytes.Equal(originalData, decodedData) {
t.Fatalf("Iteration %d: original and decoded data do not match\nOriginal: %v\nDecoded: %v", i, originalData, decodedData)
}
}
}

// TestIFFTCodec tests the encoding and decoding of random byte streams
func TestNoIFFTCodec(t *testing.T) {
// Create an instance of the DefaultBlobEncodingCodec
codec := codecs.NewNoIFFTCodec(codecs.NewDefaultBlobCodec())

// Number of test iterations
const iterations = 100

for i := 0; i < iterations; i++ {
// Generate a random length for the byte slice
length, err := rand.Int(rand.Reader, big.NewInt(1024)) // Random length between 0 and 1023
if err != nil {
panic(err)
}
originalData := randomByteSlice(length.Int64() + 1) // ensure it's not length 0

// Encode the original data
encodedData, err := codec.EncodeBlob(originalData)
if err != nil {
t.Fatalf("Iteration %d: failed to encode blob: %v", i, err)
}
encodedData := codecs.EncodePayload(originalData)

// Decode the encoded data
decodedData, err := codec.DecodeBlob(encodedData)
decodedData, err := codecs.DecodePayload(encodedData)
if err != nil {
t.Fatalf("Iteration %d: failed to decode blob: %v", i, err)
}
Expand Down
61 changes: 0 additions & 61 deletions api/clients/codecs/default_blob_codec.go

This file was deleted.

39 changes: 39 additions & 0 deletions api/clients/codecs/fft_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
package codecs

import (
"bytes"
"testing"

"github.com/Layr-Labs/eigenda/common/testutils/random"
"github.com/stretchr/testify/require"
)

// TestFFT checks that data can be IFFTed and FFTed repeatedly, always getting back the original data
func TestFFT(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TestFFT checks that data can be IFFTed and FFTed repeatedly, always getting back the original data
func TestFFT(t *testing.T) {
// TestFFT checks that data can be IFFTed and FFTed repeatedly, always getting back the original data
// TODO: we should probably be using fuzzing instead of this kind of ad-hoc random search testing
func TestFFT(t *testing.T) {

testRandom := random.NewTestRandom(t)

// Number of test iterations
iterations := testRandom.Intn(100) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit weird. Why not just always test the same number of iterations?


for i := 0; i < iterations; i++ {
originalData := testRandom.Bytes(testRandom.Intn(1024) + 1) // ensure it's not length 0

encodedData := EncodePayload(originalData)
coeffPoly, err := IFFT(encodedData)
require.NoError(t, err)

evalPoly, err := FFT(coeffPoly)
require.NoError(t, err)

// Decode the encoded data
decodedData, err := DecodePayload(evalPoly)
if err != nil {
t.Fatalf("Iteration %d: failed to decode blob: %v", i, err)
}

// Compare the original data with the decoded data
if !bytes.Equal(originalData, decodedData) {
t.Fatalf("Iteration %d: original and decoded data do not match\nOriginal: %v\nDecoded: %v", i, originalData, decodedData)
}
}
}
39 changes: 0 additions & 39 deletions api/clients/codecs/ifft_codec.go

This file was deleted.

21 changes: 0 additions & 21 deletions api/clients/codecs/no_ifft_codec.go

This file was deleted.

12 changes: 12 additions & 0 deletions api/clients/codecs/polynomial_form.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
package codecs

// PolynomialForm is an enum that represents the different ways that a blob polynomial may be represented
type PolynomialForm uint

const (
// Eval is short for "evaluation form". The field elements represent the evaluation at the polynomial's expanded
// roots of unity
Eval PolynomialForm = iota
// Coeff is short for "coefficient form". The field elements represent the coefficients of the polynomial
Coeff
)
Loading
Loading