Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query CAR database according to epoch #3555

Open
1 of 3 tasks
lemmih opened this issue Oct 9, 2023 · 1 comment
Open
1 of 3 tasks

Query CAR database according to epoch #3555

lemmih opened this issue Oct 9, 2023 · 1 comment
Assignees
Labels
Performance Priority: 2 - High Very important and should be addressed ASAP Ready Issue is ready for work and anyone can freely assign it to themselves Type: Enhancement

Comments

@lemmih
Copy link
Contributor

lemmih commented Oct 9, 2023

Issue summary

There are two kinds of data stores in Forest: Read-only CAR files and writeable ParityDB databases. Since all values are uniquely determined by their key, it does not matter for correctness which data store we query first. It does matter for performance, though, and we want to query as few data stores as possible.

We first query the CAR data stores in the order they were added. If a key isn't present in the CAR data stores, we then query the ParityDB database. However, in regular operation, we're significantly more likely to query new data than old data. As such, querying the data stores with the latest data first is a better option. The ParityDB database contains current data and is, therefore, the newest and should be queried first. The CAR data stores should be sorted from highest to lowest by epoch.

  • Query ParityDB first.
  • Sort CAR data stores by the epoch of the heaviest tipset.
  • (Optional) Trim unusable CAR data stores if we know we'll never access their data. This happens if we're evaluating tipset 1000, and we have CAR stores for epochs 0-1500 and 1500-3000. The second CAR store will not be used for evaluating this tipset and can be removed from the list.

Other information and links

@lemmih lemmih added Priority: 3 - Medium Nice-to-have, does not impede core functionality Performance Ready Issue is ready for work and anyone can freely assign it to themselves labels Oct 9, 2023
@lemmih lemmih added Priority: 4 - Low Limited impact and can be implemented at any time and removed Priority: 3 - Medium Nice-to-have, does not impede core functionality labels Oct 21, 2023
@elmattic elmattic self-assigned this Feb 28, 2024
@lemmih lemmih added Type: Enhancement Priority: 2 - High Very important and should be addressed ASAP and removed Priority: 4 - Low Limited impact and can be implemented at any time labels Aug 2, 2024
@lemmih
Copy link
Contributor Author

lemmih commented Aug 2, 2024

Calling .iter() on a BinaryHeap traverses the items in arbitrary order. We want to traverse the CAR files from the highest epoch to the lowest epoch.

forest/src/db/car/many.rs

Lines 159 to 163 in 6067514

for reader in self.read_only.read().iter() {
if let Some(val) = reader.car.get(k)? {
return Ok(Some(val));
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Priority: 2 - High Very important and should be addressed ASAP Ready Issue is ready for work and anyone can freely assign it to themselves Type: Enhancement
Projects
None yet
Development

No branches or pull requests

2 participants