Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disperser auth #984

Merged
merged 62 commits into from
Jan 8, 2025
Merged

Disperser auth #984

merged 62 commits into from
Jan 8, 2025

Conversation

cody-littley
Copy link
Contributor

Why are these changes needed?

Authenticate StoreChunks() requests.

Checks

  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • I've checked the new test coverage and the coverage percentage didn't drop.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
@cody-littley cody-littley self-assigned this Dec 11, 2024
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
@cody-littley cody-littley marked this pull request as ready for review December 19, 2024 19:07
Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, few comments

api/clients/v2/request_signer.go Outdated Show resolved Hide resolved
api/clients/v2/node_client.go Outdated Show resolved Hide resolved
uint32 disperserID = 2;

// Signature using the disperser's ECDSA key over keccak hash of the batch. The purpose of this signature
// is to prevent hooligans from tricking DA nodes into storing data that they shouldn't be storing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤣

disperser/cmd/controller/config.go Show resolved Hide resolved
node/auth/authenticator.go Outdated Show resolved Hide resolved
core/eth/reader.go Show resolved Hide resolved
return nil
}

key, err := a.getDisperserKey(ctx, now, request.DisperserID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would result in an extra ETH call for any new DisperserID it sees.
Since we're already hardcoding the only allowed disperser, does it make sense if we consider a request invalid if DisperserID is unknown (not EigenLabsDisperserID) in isAuthenticationStillValid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	var defaultAddress gethcommon.Address
	if err != nil {
		return defaultAddress, fmt.Errorf("failed to get disperser address: %w", err)
	}
	if address == defaultAddress {
		return defaultAddress, fmt.Errorf("disperser with id %d not found", disperserID)
	}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this reply is for another comment in reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're correct @ian-shim, wires got crossed here. 🙃

The response I intended to type here was that the code does cache this value inside this cache: keyCache *lru.Cache[uint32, *keyWithTimeout].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case I was worried about is an exploit with sending requests with increasing disperser IDs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. You don't need to send increasing disperser IDs, since nothing is cached when a non-existent disperser ID is queried.

The core purpose of this caching was to reduce the latency of this RPC. The latency of an invalid request is not really that important. Even if reading the chain state is not particularly expensive, I agree that it's generally good practice to avoid doing as much as possible for invalid RPCs.

The solution in the immediate term is to simply reject any disperser ID other than 0. I've made that change.

When we eventually have multiple dispersers, we will need to add a function to the smart contract that returns a list of all currently valid disperser IDs. We can cache this list and refresh it every X seconds, and always reject any request for a disperser that is unknown. Unfortunately this change will not be backwards comparable, meaning that node operators will have a required upgrade when we enable multiple dispersers. Not the end of the world, but not much we can do to avoid it now I think (short of updating contracts again).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my fix, I add an additional parameter to the authentication: disperserIDFilter func(uint32) bool

If a disperser ID causes this function to return false, then the authentication attempt immediately fails.

The purpose of this extra complexity (as opposed to hard coding it) is to avoid having to delete unit tests. The authenticator is more or less capable of handling multiple dispersers (when we eventually have multiple dispersers), and I wrote unit tests to cover those scenarios. Although I could delete those for slightly simpler code, I'd like to avoid having to rewrite them when we get around to decentralized dispersers.

nodeClientManager, err := controller.NewNodeClientManager(config.NodeClientCacheSize, logger)

var requestSigner clients.RequestSigner
if !config.DisperserStoreChunksSigningDisabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the store chunks signing is disabled, we should probably throw a warning on the start up logs.

Copy link
Contributor Author

@cody-littley cody-littley Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	var requestSigner clients.DispersalRequestSigner
	if config.DisperserStoreChunksSigningDisabled {
		logger.Warn("StoreChunks() signing is disabled")
	} else {
		requestSigner, err = clients.NewDispersalRequestSigner(
			context.Background(),
			config.AwsClientConfig.Region,
			config.AwsClientConfig.EndpointURL,
			config.DisperserKMSKeyID)
		if err != nil {
			return fmt.Errorf("failed to create request signer: %v", err)
		}
	}


// authenticationTimeoutDuration is the duration for which an auth is valid.
// If this is zero, then auth saving is disabled, and each request will be authenticated independently.
authenticationTimeoutDuration time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this value low for the beginning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've set this to be one minute by default. Is that sufficient, or should we lower it further?

Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
config *NodeClientConfig
initOnce sync.Once
conn *grpc.ClientConn
requestSigner DispersalRequestSigner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slightly different structuring of it: shall we pass in an interface, not a concrete implementation?
Like we have RequestSigners interface (and passed around), but with a KMS based implementation. This leaves space for potential non-KMS (i.e. non AWS specific) options (in a decentralized scenario it may be needed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about this comment. DispersalRequestSigner is already an interface. Is the interface ok in its current form?

// DispersalRequestSigner encapsulates the logic for signing GetChunks requests.
type DispersalRequestSigner interface {
	// SignStoreChunksRequest signs a StoreChunksRequest. Does not modify the request
	// (i.e. it does not insert the signature).
	SignStoreChunksRequest(ctx context.Context, request *grpc.StoreChunksRequest) ([]byte, error)
}

&bind.CallOpts{
Context: ctx,
},
disperserID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we settle on either ID or Key? Using both is unnecessary and adds a little bit confusion.
(if this is meant to be an int, ID seems fit)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with calling this an "ID". These are serial numbers. Eventually when we allow the community to register dispersers, we will allocate these in monotonic order via a smart contract.

Where do you see the terminology "key" being used in this context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I see what you are getting at. There is a method in the chain reader that references the disperser key:

getDisperserKey(
	ctx context.Context,
	now time.Time,
	disperserID uint32) (*gethcommon.Address, error)

The disperser key and the disperser ID are actually distinct things.

  • Disperser ID: a unique identifier for the disperser, a 32 bit serial number
  • Disperser (public) key: the public key used to verify StoreChunks() requests from the disperser, is an eth address

The reason why the disperser's public key is not a good identifier for the disperser is that we want the capability to re-key the disperser (e.g. if we lose the key, it gets compromised, or we just want to rotate it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, seems like the “operator address" v.s. "operator ID" case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why DisperserKeyToAddress is accepting a disperserID as param?

Copy link
Contributor Author

@cody-littley cody-littley Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a map from disperser IDs (serial numbers) to disperser public keys (eth addresses). This function looks up the address given the disperser's ID. The disperser's public key needs to be registered on-chain proper to this call, since the public key is not derived from the ID. Until we decentralize dispersers, there will always be exactly one disperser ID: 0.

The reason why we decided to add this complexity now is that we wanted the ability to change the public key in case we need to rotate the disperser's keys. The disperser ID part was super simple to have, and will mean that we don't need to change this contract in the future when we move to decentralized dispersers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea the context makes sense. What I mean was why DisperserKeyToAddress is not called DisperserIDToAddress, since it's accepting an ID and mapping it to address.

@@ -0,0 +1,62 @@
package clients
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can update the file name as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, forgot to rename the file. 😅 Fixed.

return nil
}

key, err := a.getDisperserKey(ctx, now, request.DisperserID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this reply is for another comment in reader.

// chainReader is used to read the chain state.
chainReader core.Reader

// keyCache is used to cache the public keys of dispersers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what public key has only 32 bits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uint32 is the relay ID. These are serial numbers, not eth keys. Currently we have exactly one disperser, and its ID is hard coded to be uint32(0).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the comment then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comments to clarify:

// keyCache is used to cache the public keys of dispersers. The uint32 map keys are disperser IDs. Disperser
// IDs are serial numbers, with the original EigenDA disperser assigned ID 0. The map values contain
// the public key of the disperser and the time when the local cache of the key will expire.
keyCache *lru.Cache[uint32 /* disperser ID */, *keyWithTimeout]

return fmt.Errorf("failed to verify request: %w", err)
}

a.saveAuthenticationResult(now, origin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cacheAuthenticationResult, "save" feels like it's persisting the data which it's not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

}
}

address, err := a.chainReader.GetDisperserAddress(ctx, disperserID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd cache this ID to address mapping to save RPCs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cached (see keyCache *lru.Cache[uint32, *keyWithTimeout]). By design, we re-fetch the same key again after a timeout just in case the key was changed. But not for each RPC.

@@ -238,6 +238,33 @@ var (
EnvVar: common.PrefixEnvVar(EnvVarPrefix, "CHUNK_DOWNLOAD_TIMEOUT"),
Value: 20 * time.Second,
}
DisableDispersalAuthenticationFlag = cli.BoolFlag{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why disabling is an option? shouldn't it be non optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's enabled by default. Necessary to get things working in inabox. Inabox integration is kind of complex, so I'd like to merge this code with it disabled in the e2e test, and to enable it in a follow up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file might be better fit in api/clients

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved all GRPC hashing code to a new package api/hashing. There is now no longer a dependency (outside of test code) between the client and this file.

@@ -3,6 +3,7 @@ package clients
import (
"context"
"fmt"
"github.com/Layr-Labs/eigenda/node/auth"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to me this is a good dependency direction for api/ to depend on node/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this dependency is no longer present

Signed-off-by: Cody Littley <cody@eigenlabs.org>
Signed-off-by: Cody Littley <cody@eigenlabs.org>
Copy link
Contributor

@jianoaix jianoaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

// chainReader is used to read the chain state.
chainReader core.Reader

// keyCache is used to cache the public keys of dispersers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the comment then?

&bind.CallOpts{
Context: ctx,
},
disperserID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why DisperserKeyToAddress is accepting a disperserID as param?

Signed-off-by: Cody Littley <cody@eigenlabs.org>
grpc "github.com/Layr-Labs/eigenda/api/grpc/node/v2"
"github.com/Layr-Labs/eigenda/api/hashing"
"github.com/Layr-Labs/eigenda/common"
"github.com/aws/aws-sdk-go-v2/aws"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd prefer to have a thin kms wrapper client and use it to interface with kms instead of directly importing aws libraries. But feel free to ignore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is already just a thin wrapper, although its use case specific and not general purpose. Let's discuss this and make any changes in a follow up PR.

common/kms.go Outdated
@@ -0,0 +1,167 @@
package common
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws related utils are under common/aws/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to common/aws

return nil
}

key, err := a.getDisperserKey(ctx, now, request.DisperserID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case I was worried about is an exploit with sending requests with increasing disperser IDs

Signed-off-by: Cody Littley <cody@eigenlabs.org>
@cody-littley cody-littley merged commit 344a052 into Layr-Labs:master Jan 8, 2025
6 checks passed
@cody-littley cody-littley deleted the disperser-auth branch January 8, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants