Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Add push and get methods for RestartLastVotedForkSlots #33613

Merged

Conversation

wen-coding
Copy link
Contributor

Add push_restart_last_voted_fork_slots and get_restart_last_voted_fork_slots.

@codecov
Copy link

codecov bot commented Oct 10, 2023

Codecov Report

Merging #33613 (99c19f6) into master (b4c652e) will increase coverage by 0.0%.
Report is 1 commits behind head on master.
The diff coverage is 99.0%.

Additional details and impacted files
@@           Coverage Diff            @@
##           master   #33613    +/-   ##
========================================
  Coverage    81.9%    81.9%            
========================================
  Files         816      816            
  Lines      219753   219911   +158     
========================================
+ Hits       180001   180169   +168     
+ Misses      39752    39742    -10     

Comment on lines 970 to 976
let mut last_voted_fork_slots = RestartLastVotedForkSlots::new(
self.id(),
now,
last_vote_bankhash,
self.my_shred_version(),
);
last_voted_fork_slots.fill(update);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this 2 stage initialization?
Can't new fill out the slots as well?

also, the update logic is quite different from push_epoch_slots; i.e the existing value in gossip crds is entirely discarded. is that intentional? if so then RestartLastVotedForkSlots api could be quite simpler (e.g. no need for 2 stage initialization).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we only have one packet per validator, yes the value with newer timestamp should always replace the old one.

The 2 stage initialization was still there because we need self.max_compressed_slot_size(), which can't be called in a constructor. But maybe we can approximate assuming header etc is around 100+?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this initialization, max_compressed_slot_size is always called on a RestartLastVotedForkSlots with empty slots. So it is a constant number. You can define it as a const and then define a test which validates the const value. Similar to:
https://github.com/solana-labs/solana/blob/09e858d93/gossip/src/cluster_info.rs#L3442-L3448

Also looking at the code, when there are too many slots, fill is taking the initial set of slots and ignoring the remaining ones.
You probably want to do the opposite; i.e. take the last slots, not the first slots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, actually I don't care too much about time complexity here, because it is only called once on start. So I can try to pack in as many slots as possible into one packet. Changed.

gossip/src/cluster_info.rs Outdated Show resolved Hide resolved
Comment on lines 523 to 542
last_voted_fork.sort();
let mut compressed_slots = CompressedSlots::new(1);
let fork_length = last_voted_fork.len();
let last_voted_slot = last_voted_fork.last();
for slice_length in
max(1, min(fork_length, MAX_RESTART_LAST_VOTED_FORK_SLOTS_SPACE))..fork_length + 1
{
let mut new_slots = CompressedSlots::new(slice_length);
let new_slice = &last_voted_fork[fork_length.saturating_sub(slice_length)..];
new_slots.add(new_slice);
let _ = new_slots.deflate();
if last_voted_slot.unwrap() - new_slots.first_slot() < MAX_SLOTS_PER_ENTRY as u64
&& serialized_size(&new_slots).unwrap()
< MAX_RESTART_LAST_VOTED_FORK_SLOTS_SPACE as u64
{
compressed_slots = new_slots;
} else {
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now pretty confusing and hacky. Can we do something better in here?

I think part of the complexity is because of using CompressedSlots here, but we need a much simpler construct here because unlike EpochSlots this is not updating/appending to old values.

For example, just encoding the last slot, and a bit-vector for representing the parents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only have ~800 bytes available, a straight bit-vector can only encode 7.2k slots, so I put flate2 compression in there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that flate2 compression don't make it worse?
I mean the bits representing parent slots can be totally random and not have any patterns, in which case compression could make it worse.
Also there is this concern with using compression: #14362 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is valid concern. The problem of not using compression is that we can only have 7.2k slots going back in time which is roughly 48 minutes, this is really too short because currently we rarely resolve an outage within 48 minutes. (Even if your local vote is chaining to a parent generated 2 hours ago, your bitmap in the past 48 minutes will all be 0's so it's zero information.)

About whether compression will make it worse, I tested a few case, it sounds like compression can compress large sequence of 0's or 1's really well. If you think about what's happening in an outage (@carllin since he expressed similar concerns):

  1. outages where blocks arrive very slowly (outage 20230225, 20220430): most validators can't really vote on the blocks generated by others, so it's voting on a local fork with blocks generated by itself
  2. outages where new blocks can't chain off each other, they all keep building from the parent (outage 20220930): same as above, voting on a local fork with blocks generated by itself
  3. outages where >1/3 of the validators don't agree any more (outage 20220601): in this case we normally have several big forks, so you should see >1/3 of 1's in the forks we care about, that should be many consecutive 1's

So I think in the outage scenarios we have seen, compression will generally help. If we are concerned about compression here, we can implement our own compression in a separate PR if needed, or we can consider more than one RestartLastVotedForkSlots packets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you expect a lot of consecutive 1's in the parent chain, then maybe we can use a run-length encoding for 1s
https://en.wikipedia.org/wiki/Run-length_encoding
The lengths can also be encoded with varint encoding:
https://github.com/solana-labs/solana/blob/54b796f5a/sdk/program/src/serde_varint.rs

I don't know if it is going to be better or not, and the worst case still adds overhead compared to not encoded raw data.
do you want to do some experiments on how does above compare against flate2 compression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think during outages there will be lots of consecutive 0's or consecutive 1's in the bitmap, consecutive 0's happen when blocks arrive very slowly, consecutive 1's happen when >1/3 of the validators diverge. In both cases run-length encoding can probably compress the bitmap tight enough to fit 16k slots. I'll have a try at implementing a very simple run-length encoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced flate2 compression with a simple run-length encoding implementation. Under the best case (tons of consecutive 0's or 1's) the compression rate is 127:1. Under the worst case (very random) it performs worse than raw bits. The performance is comparable to flate2, but it's so much easier to fit everything into given byte array here. The previous flate2 implementation has a bug because at end of compression it needs to write some headers into the output buffer, and the header length is very hard to predict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And RestartLastVotedForkSlots will select raw bits if that performs better.

pub last_voted_hash: Hash,
pub shred_version: u16,
}

impl Sanitize for RestartLastVotedForkSlots {
fn sanitize(&self) -> std::result::Result<(), SanitizeError> {
if self.slots.is_empty() {
if self.last_voted_slot == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you almost will never encode last_vote_slot to be 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so is the case for slot 1 or 2 or 3. should we rule them out as well?
if nothing breaks with last_voted_slot == 0 then I don't think sanitize should bother with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
);
}
Err(e) => error!("Error compressing slots {:?}", e),
_ => (),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What cases is _ ignoring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

gossip/src/crds_value.rs Outdated Show resolved Hide resolved
pub fn to_slots(&self, min_slot: Slot) -> Result<Vec<Slot>, DecompressError> {
let mut slots = Vec::new();
if self.last_voted_slot >= min_slot {
let mut uncompressed = Vec::with_capacity(self.uncompressed_bytes as usize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come EpochSlots does not need to encode uncompressed_bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EpochSlots uses self.num to see how many slots are encoded, which is similar to self.uncompressed_bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A malicious node can cause OOM by sending very large uncompressed_bytes value, and you are not sanitizing this field.
This has at most a minor impact on performance. We can remove it entirely an instead use something like:

Vec::with_capacity(self.compressed_slots.len() * 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we don't need more than MAX_SLOTS_PER_ENTRY, changed.

pub fn to_slots(&self, min_slot: Slot) -> Result<Vec<Slot>, DecompressError> {
let mut slots = Vec::new();
if self.last_voted_slot >= min_slot {
let mut uncompressed = Vec::with_capacity(self.uncompressed_bytes as usize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A malicious node can cause OOM by sending very large uncompressed_bytes value, and you are not sanitizing this field.
This has at most a minor impact on performance. We can remove it entirely an instead use something like:

Vec::with_capacity(self.compressed_slots.len() * 2)

gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
pub last_voted_hash: Hash,
pub shred_version: u16,
}

impl Sanitize for RestartLastVotedForkSlots {
fn sanitize(&self) -> std::result::Result<(), SanitizeError> {
if self.slots.is_empty() {
if self.last_voted_slot == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so is the case for slot 1 or 2 or 3. should we rule them out as well?
if nothing breaks with last_voted_slot == 0 then I don't think sanitize should bother with that.

let uncompressed_bytes;
if last_voted_fork.is_empty() {
return Err("Last voted slot must be specified".to_string());
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line above is a return. so this else and the extra indention below is redundant.
We also don't need the uninitialized variables above (e.g. let last_voted_slot;) if the else is removed.

return Err("Last voted slot must be specified".to_string());
} else {
last_voted_fork.sort_by_key(|a| Reverse(*a));
last_voted_slot = last_voted_fork[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[0] can panic. There is last_voted_fork.is_empty() above , but the better style is to remove that and instead change this line to:

let Some(last_voted_slot) = last_voted_fork.get(0).copied() else {
   return Err(...);
}

gossip/src/crds_value.rs Outdated Show resolved Hide resolved
Err(e) => return Err(e.to_string()),
// compression ended successfully, proceed.
Ok(flate2::Status::StreamEnd) => (),
Ok(flate2::Status::BufError) => (),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing in this case some of the slots are dropped because there is no more buffer to write to.
Can you comment this out here in the code?
Also can you have a test that in this case the other nodes will still receive a valid payload and can decompress to a valid list of slots consistent with the most recent slots in the original vector.

gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/1 Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/cluster_info.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
pub fn new(bits: &BitVec<u8>) -> Self {
let mut encoded: Vec<U16> = Vec::new();
let mut current_bit = 1;
let mut current_bit_count: u16 = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why u16 is sufficient here?
can't the gaps exceed 65536?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because when we generate the code we only send MAX_SLOTS_PER_ENTRY slots, which is 16k, so u16 is enough.

Comment on lines 507 to 508
// The vector always starts with 1. Encode number of 1's and 0's consecutively.
// For example, 110000111 is [2, 4, 3].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why start with 1?
are you including last_voted_slot in the bit-vec as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, last_voted_slot is included in the bit-vec. If we can always start with 1 we don't encode whether the first value is 0 or 1.

gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
gossip/src/crds_value.rs Outdated Show resolved Hide resolved
let encoded = (0..bits.len())
.map(|i| bits.get(i))
.dedup_with_count()
.map(|(count, _)| count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.map_while(|(count, _value)| u16::try_from(count).ok())

instead of .map here and remove u16::try_from from valid_and_has_space.
Also don't need .try_inot().unwrap() below either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.map(|i| bits.get(i))
.dedup_with_count()
.map(|(count, _)| count)
.take_while(|count| Self::valid_and_has_space(&mut total_serde_bytes, *count))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transformation is technically a scan, not a take_while.
See the example here:
https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.scan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

pub fn slots_count(&self) -> usize {
self.encoded.iter().map(|x| x.0 as usize).sum::<usize>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as has no type-safety checks.
usize::from instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.encoded
.iter()
.zip([1, 0].iter().cycle())
.flat_map(|(bit_count, bit)| std::iter::repeat(bit).take(bit_count.0.into()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.into is ambiguous that what type change happens here.
can you use usize::from here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 559 to 565
.filter_map(|(offset, bit): (usize, &i32)| {
if *bit > 0 {
Some(last_slot.saturating_sub(offset as Slot))
} else {
None
}
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate filter then map here would be more readable than filter_map with and if branch.

.filter(|(_offset, bit)| *bit == 1)
.map(|(offset, _bit)| last_slot.saturating_sub(offset as Slot))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 588 to 599
for i in 0..self.bits.len() {
if self.bits.get(i) {
let slot = last_slot.saturating_sub(i as Slot);
if slot < min_slot {
return result;
}
result.push(slot);
if result.len() > MAX_SLOTS_PER_ENTRY {
break;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be written using iterators instead of manual for loops and if branches;
something like:

(0..self.bits.len())
  .filter(|k| self.bits.get(k))
  .map(|offset| last_slot.saturing_sub(offset as Slot))
  .take_while(|slot| slot >= min_slot)
  .take(MAX_SLOTS_PER_ENTRY)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 632 to 636
let (first_voted_slot, last_voted_slot) = match last_voted_fork.iter().minmax() {
NoElements => return Err("Last voted slot must be specified".to_string()),
OneElement(slot) => (slot, slot),
MinMax(min, max) => (min, max),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let Some((first_voted_slot, last_voted_slot)) = last_voted_fork.iter().minmax().into_option() else {
    return Err(...);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

gossip/src/crds_value.rs Show resolved Hide resolved
uncompressed_bitvec.set(last_voted_slot - *slot, true);
}
let run_length_encoding = RunLengthEncoding::new(&uncompressed_bitvec);
let raw_offsets = RawOffsets::new(uncompressed_bitvec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should go inside the else block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -962,6 +962,23 @@ impl ClusterInfo {
}
}

pub fn push_restart_last_voted_fork_slots(&self, update: &[Slot], last_vote_bankhash: Hash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update is pretty ambiguous here.
can we name the arg fork?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 520 to 523
if *current_bytes > RestartLastVotedForkSlots::MAX_BYTES as u32 {
return None;
}
Some(U16(count))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*current_bytes += ((u16::BITS - count.leading_zeros() + 6) / 7).max(1) as usize;
(*current_bytes <= RestartLastVotedForkSlots::MAX_BYTES).then_some(U16(count))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Self { encoded }
}

pub fn slots_count(&self) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename slots_count to num_encoded_slots?
Also this function shouldn't be pub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

impl RunLengthEncoding {
pub fn new(bits: &BitVec<u8>) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not pub.

can you go over all extraneous pubs in this code and remove them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.zip([1, 0].iter().cycle())
.flat_map(|(bit_count, bit)| std::iter::repeat(bit).take(usize::from(bit_count.0)))
.enumerate()
.filter(|(_, bit)| **bit > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**bit == 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

pub fn to_slots(&self, last_slot: Slot, min_slot: Slot) -> Vec<Slot> {
let mut result: Vec<Slot> = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename result to slots?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.flat_map(|(bit_count, bit)| std::iter::repeat(bit).take(usize::from(bit_count.0)))
.enumerate()
.filter(|(_, bit)| **bit > 0)
.map(|(offset, _)| last_slot.saturating_sub(offset as Slot))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably stop if offset > last_slot.
We cannot assume values received from other nodes are sane.

.map_while(|(offset, _bit)| {
   let offset = Slot::try_from(offset).ok()?;
   last_slot.checked_sub(offset)
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

pub fn to_slots(&self, last_slot: Slot, min_slot: Slot) -> Vec<Slot> {
let mut result: Vec<Slot> = (0..self.bits.len())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename result to slots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pub fn to_slots(&self, last_slot: Slot, min_slot: Slot) -> Vec<Slot> {
let mut result: Vec<Slot> = (0..self.bits.len())
.filter(|index| self.bits.get(*index))
.map(|offset| last_slot.saturating_sub(offset))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.map_while(|offset| last_slot.checked_sub(offset))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

last_voted_fork: &[Slot],
last_voted_hash: Hash,
shred_version: u16,
) -> Result<Self, String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define an actual error type, say RestartLastVotedForkSlotsError, instead of String?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

gossip/src/crds_value.rs Show resolved Hide resolved
Comment on lines 590 to 594
#[allow(dead_code)]
#[derive(Debug, Clone)]
pub struct RestartLastVotedForkSlotsError {
error: String,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

gossip/src/crds_value.rs Show resolved Hide resolved
struct U16(#[serde(with = "serde_varint")] u16);

#[derive(Deserialize, Serialize, Clone, Debug, PartialEq, Eq, AbiExample)]
struct RunLengthEncoding {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be

struct RunLengthEncoding(Vec<u8>);

to be less verbose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

CrdsData::RestartLastVotedForkSlots(last_voted_fork_slots),
&self.keypair(),
)),
Err(e) => error!("failed to create RestartLastVotedForkSlots {e:?}"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here does not have enough context if logging the error and then ignoring it is sufficient or not.
I would rather this returns the error to the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

let offset = Slot::try_from(offset).ok()?;
last_slot.checked_sub(offset)
})
.take(MAX_SLOTS_PER_ENTRY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this MAX_SLOTS_PER_ENTRY here?
This seems like a left-over from the older code which was using EpochSlots.

There is already a RestartLastVotedForkSlots::MAX_BYTES which limits number of slots for entries created by this node.
The entries received from other nodes are also bounded by what fits into a PACKET_DATA_SIZE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RunLengthEncoding can potentially pack tons of zeroes in it, but we only send back 1's to the caller, and it's bound by local root, so I think you are right we can remove this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, we can pack tones of ones as well, and last_root can be equal to u64::MAX, and then the sender can send a lot of slots due to RLE encoding.
so now that I am thinking, we actually need to limit the length of output vector here because you cannot trust the value.
The limit does not need to be MAX_SLOTS_PER_ENTRY because those are not related anymore but we do in fact need a .take(max_slots) here. sorry for the back and forth on this.

RawOffsets::to_slots does not need a limit because that is already limited by the packet size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I think we can go back to 81k slots in the original design doc then. However, if it's 81k slots then u16 no longer fits, I've changed it into u32.

.filter(|index| self.0.get(*index))
.map_while(|offset| last_slot.checked_sub(offset))
.take_while(|slot| *slot >= min_slot)
.take(MAX_SLOTS_PER_ENTRY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly here re MAX_SLOTS_PER_ENTRY.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -566,6 +571,9 @@ impl RawOffsets {
}
}

// Per design doc, we should start wen_restart within 9 hours.
pub const MAX_SLOTS_RESTART_LAST_VOTED_FORK_SLOTS: usize = 81000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition also needs to go inside struct RestartLastVotedForkSlots. Similar to MAX_BYTES.

With 5000 nodes, each sending 81k slot, it would take 3GB of memory just to store all those Vec<Slot>.
Can we reduce this limit to 65,535 so that we can stick with u16?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine. The original argument for 9 hours is so that admins can wake up and join the restart, but if they sleep sound enough then even 9 hours wouldn't be enough. In reality we normally have >80% of the admins wake up within 7 hours.

behzadnouri
behzadnouri previously approved these changes Nov 16, 2023
Copy link
Contributor

@behzadnouri behzadnouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -600,6 +597,9 @@ impl RestartLastVotedForkSlots {
// This number is MAX_CRDS_OBJECT_SIZE - empty serialized RestartLastVotedForkSlots.
const MAX_BYTES: usize = 824;

// Per design doc, we should start wen_restart within 7 hours.
pub const MAX_SLOTS: usize = 65535;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be pub?
also u16::MAX as usize or 0xFFFF might be more descriptive where this number comes from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. It needs to be pub because later when we create the fork from wen_restart code we can limit the fork size at 65535.

@wen-coding wen-coding merged commit 3081b43 into solana-labs:master Nov 16, 2023
17 checks passed
@wen-coding wen-coding deleted the push_get_restart_last_voted_fork_slots branch November 16, 2023 20:36
@wen-coding wen-coding self-assigned this Apr 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants