Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(metrics): pass the epoch manager to spawn_trie_metrics_loop #12811

Merged
merged 1 commit into from
Jan 27, 2025

Conversation

marcelo-gonzalez
Copy link
Contributor

This function creates its own epoch manager with new_from_genesis_config(), but this is not how the main epoch manager is created at startup, which is done with new_arc_handle(). These result in epoch managers that return different EpochConfigs for a given protocol version, since only the second one initializes an epoch config store from the genesis config, and the first one generates epoch configs with generate_epoch_config(). In this case it leads to a crash if the shard layouts are different because the call to ShardUId::from_shard_id_and_layout() will see the wrong shard IDs and crash at assert!(shard_layout.shard_ids().any(|i| i == shard_id));

This function creates its own epoch manager with new_from_genesis_config(),
but this is not how the main epoch manager is created at startup, which is
done with new_arc_handle(). These result in epoch managers that return different
EpochConfigs for a given protocol version, since only the second one initializes
an epoch config store from the genesis config, and the first one generates
epoch configs with generate_epoch_config(). In this case it leads to a crash if
the shard layouts are different because the call to `ShardUId::from_shard_id_and_layout()`
will see the wrong shard IDs and crash at `assert!(shard_layout.shard_ids().any(|i| i == shard_id));`
@marcelo-gonzalez marcelo-gonzalez requested a review from a team as a code owner January 27, 2025 21:41
Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

What exactly was broken?

Does this need to be included in the release?

@marcelo-gonzalez
Copy link
Contributor Author

nice

What exactly was broken?

Does this need to be included in the release?

Nothing in prod was broken, but I was getting a crash because of this when I ran a localnet test with 6 shards with use_production_config=True. It looks to me like any test in pytest/tests/sanity would prob fail if you set use_production_config=True bc of this. You get a crash because then the metrics loop thinks the shard layout should be ShardLayout::get_simple_nightshade_layout_v5(). I guess on testnet/mainnet the shard layouts will match up anyway? But could be a good idea to include this in the release just in case something like this is possible cc @VanBarbascu

@marcelo-gonzalez marcelo-gonzalez added this pull request to the merge queue Jan 27, 2025
Merged via the queue into near:master with commit 8097202 Jan 27, 2025
24 checks passed
@marcelo-gonzalez marcelo-gonzalez deleted the metrics-fix branch January 27, 2025 23:03
VanBarbascu pushed a commit that referenced this pull request Jan 29, 2025
This function creates its own epoch manager with
new_from_genesis_config(), but this is not how the main epoch manager is
created at startup, which is done with new_arc_handle(). These result in
epoch managers that return different EpochConfigs for a given protocol
version, since only the second one initializes an epoch config store
from the genesis config, and the first one generates epoch configs with
generate_epoch_config(). In this case it leads to a crash if the shard
layouts are different because the call to
`ShardUId::from_shard_id_and_layout()` will see the wrong shard IDs and
crash at `assert!(shard_layout.shard_ids().any(|i| i == shard_id));`
marcelo-gonzalez added a commit to marcelo-gonzalez/nearcore that referenced this pull request Jan 30, 2025
…#12811)

This function creates its own epoch manager with
new_from_genesis_config(), but this is not how the main epoch manager is
created at startup, which is done with new_arc_handle(). These result in
epoch managers that return different EpochConfigs for a given protocol
version, since only the second one initializes an epoch config store
from the genesis config, and the first one generates epoch configs with
generate_epoch_config(). In this case it leads to a crash if the shard
layouts are different because the call to
`ShardUId::from_shard_id_and_layout()` will see the wrong shard IDs and
crash at `assert!(shard_layout.shard_ids().any(|i| i == shard_id));`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants