Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Blockstore::get_sigs_for_addr2: ensure lowest_slot >= first_available_block #33556

Merged
merged 3 commits into from
Oct 6, 2023

Conversation

CriesofCarrots
Copy link
Contributor

Problem

As pointed out here, Blockstore::get_confirmed_signatures_for_address2() does more iterating than it needs to.

Summary of Changes

Ensure lowest_slot >= first_available_block:

  • Update get_transaction_status to only return data from first_available_block and onward (this is the expected behavior from the perspective of rpc already)
  • When a transaction status is not found, set lowest_slot to first_available_block

@codecov
Copy link

codecov bot commented Oct 6, 2023

Codecov Report

Merging #33556 (a85792b) into master (6f1922b) will decrease coverage by 0.1%.
Report is 1 commits behind head on master.
The diff coverage is 87.5%.

@@            Coverage Diff            @@
##           master   #33556     +/-   ##
=========================================
- Coverage    81.7%    81.7%   -0.1%     
=========================================
  Files         805      805             
  Lines      218162   218160      -2     
=========================================
- Hits       178410   178394     -16     
- Misses      39752    39766     +14     

Comment on lines 2385 to 2388
let first_available_block = self.get_first_available_block()?;
if slot < first_available_block {
return Ok(None);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we need this check here after the fact. Within get_transaction_status_with_counter(), we get the lowest_cleanup_slot and build our iterator in the forward direction from that slot

let index_iterator = self.transaction_status_cf.iter(IteratorMode::From(
(
transaction_status_cf_primary_index,
signature,
lowest_available_slot,
),
IteratorDirection::Forward,

So, it is the case that if we find a matching sig, then the slot the sig was in is >= lowest_cleanup_slot. We then grab the transaction status and return it; assuming it is valid, I see no reason to discard it at the end.

Also, get_transaction_status_with_counter() holds the lowest_cleanup_slot lock. Imagine the following sequence:

  1. get_transaction_status() calls get_transaction_status_with_counter()
  2. get_transaction_status_with_counter() grabs the lowest_cleanup_slot lock, which is at slot X
  3. get_transaction_status_with_counter() finds the transaction status for desired sig in slot X + i
  4. get_transaction_status_with_counter() returns and gives up lowest_cleanup_slot lock
  5. LedgerCleanupService comes along and does some cleanup, advancing lowest_cleanup_slot to X + c where c > i
  6. get_transaction_status()checks the result slot againstget_first_available_block(); it is the case that get_first_available_block() > X + c`
  7. From 5), i < c so the result will get discarded and we'll return Ok(None)

This isn't a dangerous race condition necessarily (and given the timing required, I think this would be incredibly unlikely), but again, I don't see any reason to discard the result if we already looked it up and found it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, maybe hoist this logic to get_transaction_status_with_counter(), inside the lock, then.
If we don't, the Some() cases in the before and until blocks in get_confirmed_signatures_for_address2() could return a slot/lowest_slot > first_available_block, because lowest_cleanup_slot is not necessarily (in fact usually isn't) the same as first_available_block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, I think we already have the race you described between get_transaction_status() and get_confirmed_signatures_for_address2(), because we pull the whole block after releasing the lowest_cleanup_slot lock. But fixing that, if we choose to, can be independent of this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because lowest_cleanup_slot is not necessarily (in fact usually isn't) the same as first_available_block.

Talking this through "out-loud" to help gather my thoughts ... here is the definition of get_first_available_block():

pub fn get_first_available_block(&self) -> Result<Slot> {
let mut root_iterator = self.rooted_slot_iterator(self.lowest_slot_with_genesis())?;
let first_root = root_iterator.next().unwrap_or_default();
// If the first root is slot 0, it is genesis. Genesis is always complete, so it is correct
// to return it as first-available.
if first_root == 0 {
return Ok(first_root);
}
// Otherwise, the block at root-index 0 cannot ever be complete, because it is missing its
// parent blockhash. A parent blockhash must be calculated from the entries of the previous
// block. Therefore, the first available complete block is that at root-index 1.
Ok(root_iterator.next().unwrap_or_default())

And here is lowest_slot_with_genesis():
fn lowest_slot_with_genesis(&self) -> Slot {
for (slot, meta) in self
.slot_meta_iterator(0)
.expect("unable to iterate over meta")
{
if meta.received > 0 {
return slot;
}
}
// This means blockstore is empty, should never get here aside from right at boot.
self.last_root()
}

So, in normal conditions, lowest_slot_with_genesis() will return the first slot with any shreds (not necessarily complete and thus also not necessarily a root). Let's call this slot S_l. get_first_available_block() then looks for the first rooted slot from S_l, and then returns the next rooted slot. So, if S_l did happen to be a root, the result of get_first_available_block() will be S_l's child

lowest_cleanup_slot is the most recent slot that we purged. So, assuming lowest_cleanup_slot == C, then

  • First root in database >= C + 1
  • get_first_available_block() result >= C + 2

@CriesofCarrots CriesofCarrots requested a review from steviez October 6, 2023 18:36
@@ -8017,6 +8012,7 @@ pub mod tests {

if simulate_ledger_cleanup_service {
*blockstore.lowest_cleanup_slot.write().unwrap() = lowest_cleanup_slot;
blockstore.purge_slots(0, lowest_cleanup_slot, PurgeType::CompactionFilter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did unit test fail without this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because Blockstore::first_available_block() depends on the rooted-slot iterator, and this "simulation" wasn't adjusting the root list, just writing a new lowest_cleanup_slot. This change seemed defensible to me, since this is exactly what the LedgerCleanupService does, plus some extra data reporting.

Copy link
Contributor

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My long-winded comment aside, think we're good to here.

@CriesofCarrots CriesofCarrots merged commit f075867 into solana-labs:master Oct 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants