-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit amendment flapping (fixes #4350) #4364
Conversation
Great, happy to assign this to you and delegate myself to be a reviewer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the elegant and minimal solution. I think will work just fine to prevent casual flapping in case a validator goes offline at just the right moment.
It does not add any hysteresis and that may be something else to consider adding on top of this, but then again maybe this is sufficient on its own for all but the most extreme flapping cases and there's something to be said about keeping it simple.
else if (!hasValMajority && (majorityTime != NetClock::time_point{})) | ||
else if ( | ||
!hasValMajority && (majorityTime != NetClock::time_point{}) && | ||
!vote->passes(entry.first, vote->deactivationThreshold())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is right and should have the same behavior as the existing code, but someone else should triple-check the logic when the amendment being checked is unknown. Here's the logic: hasValMajority
will be false
; before this commit, we'd always enter this block. Now vote->passes
will return false
, and the !
will turn it to true
putting us in this block, just like before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the amendment is unknown, then how would majorityTime
be non-zero? And if it is zero for unknown amendments, then how would this branch have been entered before the change?
src/ripple/protocol/Feature.h
Outdated
@@ -74,7 +74,7 @@ namespace detail { | |||
// Feature.cpp. Because it's only used to reserve storage, and determine how | |||
// large to make the FeatureBitset, it MAY be larger. It MUST NOT be less than | |||
// the actual number of amendments. A LogicError on startup will verify this. | |||
static constexpr std::size_t numFeatures = 53; | |||
static constexpr std::size_t numFeatures = 54; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider ways to remove this altogether. It's a bit of a workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok but not in this amendment i think to keep the code small
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be updated to 55? Was another amendment added while this was being reviewed? CI fails on this number being wrong, and the diff says this new line is no longer a difference.
src/test/app/AmendmentTable_test.cpp
Outdated
// void | ||
// doRound( | ||
// uint256 const& feat, | ||
// AmendmentTable& table, | ||
// weeks week, | ||
// std::vector<std::pair<PublicKey, SecretKey>> const& validators, | ||
// std::vector<std::pair<uint256, int>> const& votes, | ||
// std::vector<uint256>& ourVotes, | ||
// std::set<uint256>& enabled, | ||
// majorityAmendments_t& majority) | ||
// { | ||
// doRound( | ||
// {feat}, | ||
// table, | ||
// week, | ||
// validators, | ||
// votes, | ||
// ourVotes, | ||
// enabled, | ||
// majority); | ||
// } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this commented-out block just be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
c2b2686
to
91ec2fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@a-noni-mousse can you resolve the merge conflicts? thanks! |
OK I have done the resolve |
I guess the downside of this approach is that if exactly one validator permanently withdraws support for an amendment that just barely has supermajority support, then it will not actually lose supermajority support until enough others withdraw their support too. I prefer the idea of tracking votes on-ledger, but I guess the community finds the risk of this approach acceptable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need to update the feature count again: #4364 (comment)
|
My bad, with the close-reopen. Apologies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am against this approach.
It puts more demand on the validators to be stably online all the time. If you're one of the swing votes against an amendment, it could gain the majority while you're having a brief outage, and sustain the lower threshold even after you come back, which is not how the system is intended to work.
This changes the threshold for activating an amendment to "65% and some luck" rather than "sustained 80%". If you were to describe this as simply lowering the threshold to 65%, I think people would rightly be against that; but this is even worse because it's "kind of 65%" but more complicated.
The correct fix for amendment flapping would be to change the activation threshold calculation so that it requires >80% of all validators on the node's UNL (ignoring Negative UNL) rather than >80% of validators currently participating in consensus. Then amendments would only "flap" when validators change their votes, which is the intended behavior.
I'm convinced by @mDuo13 too, but I will comment that using 80% of UNL vs seen validations will not fix the issue either. You'll flap on when you see a yea from 80% the UNL and off when one of those yeas misses a validation. The only way to tell the difference between "validator A is no longer voting yea" and "validator A appears to be offline" is if you see a validation from A with no yea. We're not observing that distinction in the code. If we start to carefully make that distinction, and a yea-voting validator is temporarily offline, should we pretend it is still voting yea? For how long? |
@thejohnfreeman, excellent point and question. My personal take, without trying any implementation, is that a validator should be assumed to be holding the same vote as it held before until a new vote actually arrives from that validator. So if the validator is offline it is assumed to not change its vote. However, that may be hard to implement, I'm not sure. I'm hesitant to armchair quarterback an approach that has the right balance of voting stability vs additional complexity. |
@mDuo13, @thejohnfreeman: excellent points. I had not thought of that when reviewing the code. I'm thinking that this isn't ready for "prime time" as is. |
To elaborate a bit: you can't track individual votes on ledger; whose votes would you track? You can't say "the UNL's" because there isn't one UNL or a requirement that my UNL completely overlaps with yours, and you'd need to reach consensus on which ones to include (and you can't include all of them!). It's for the same reason that you can't just track a "count" of yays or nays, or use the "size of the UNL" instead of the number of validations received. I liked this 'dual threshold' option, but obviously it's problematic for the reasons described above. Given that, I think the best option here maybe the other option (which I think I originally proposed): adding some hysteresis to the "deactivation". If an amendment has gained an 80% supermajority, but subsequently loses it at some round X, set a flag indicating that's the case. If it recovers the 80% supermajority, in round X+1, then all is well; simply remove the flag and assume that it never lost majority. If it is still below the threshold, then just assume the amendment lost majority. The hysteresis can last for a single round or it can be extended to more. |
4f00ce5
to
d264077
Compare
This commit implements a new amendment that follows one of the recommendations in XRPLF#4350 to reduce the flapping of amendments when there is validator outage and support drops below 80% required majority. A new FlappingAmendment vector is used to track amendments that lose majority and add delay of one flag ledger.
d264077
to
931a369
Compare
I have changed the code ot use hysteria instead @nbougalis @scottschurr @thejohnfreeman @mDuo13 by adding new FlappingAmendment vector and when amendment loses a majority if it is not on that list we add and do nothing else. If it is in the list then we do the old code and this causes the delay on one round to the next flag ledger for deactivation. |
// minimum number of votes needed to begin activation countdown | ||
int activationThreshold_ = 0; | ||
|
||
// minimum number of votes needed to continue activate countdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// minimum number of votes needed to continue activate countdown | |
// minimum number of votes needed to continue activation countdown |
else if (!hasValMajority && (majorityTime != NetClock::time_point{})) | ||
else if ( | ||
!hasValMajority && (majorityTime != NetClock::time_point{}) && | ||
!vote->passes(entry.first, vote->deactivationThreshold())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the amendment is unknown, then how would majorityTime
be non-zero? And if it is zero for unknown amendments, then how would this branch have been entered before the change?
auto it = std::find_if( | ||
majorities.begin(), majorities.end(), [&amendment](STObject const& o) { | ||
return o[sfAmendment] == amendment; | ||
}); | ||
|
||
if (it == majorities.end() && lostMajority) | ||
return tefALREADY; | ||
|
||
if (it != majorities.end()) | ||
{ | ||
const STArray& oldMajorities = | ||
amendmentObject->getFieldArray(sfMajorities); | ||
for (auto const& majority : oldMajorities) | ||
if (gotMajority) | ||
return tefALREADY; | ||
|
||
majorities.erase(it); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first time an amendment loses majority, the program will reach line 251, remove the amendment from the sfMajorities
array, and continue, presumably to add it to the sfFlappingAmendments
array. The second time an amendment loses majority, the program will reach line 244 and return, never continuing to fix the sfFlappingAmendments
array. Line 251 needs to be conditional on whether the amendment is in the sfFlappingAmendments
array (which itself is conditional on the amendment).
There does not seem to be a new test for this new behavior. Have you added one?
@a-noni-mousse, I glanced over this pull request and, although it follows the suggestion by @nbougalis, I'm not completely sold on the approach. I'm investigating whether what @mDuo13 suggested is feasible: #4364 (review) If that approach is feasible, then I think that would be the better path. So I'm not ignoring you. Your pull request has prompted further investigation. Thanks for your patience. |
I'm not sure I understand how what @mDuo13 is suggesting would work. He writes:
How will this prevent flapping or does it constitute a "correct fix"? Remember, votes aren't persistent which means that an amendment can flap on and off if as few as 1 validator were to miss a single vote. The problem just isn't the negative UNL; flapping was a problem before that. The problem isn't even that the threshold is calculated based on the number of trusted validators that participated in the vote; if the threshold was calculated against the size of a node's UNL, you'd actually have more flapping. See here. The problem is that there's a very narrow window of voting and a validator needs to be active during that window for it to cast a "yes" vote. Now, this can be solved in one of several ways:
|
Here's an elaboration of my previous comment that I wrote in a conversation with @scottschurr: If we switch to using the full UNL, and ignore the negative UNL (@mDuo13's suggestion), then the problem becomes that a missing validation is interpreted as a "no" vote. Both "yes" and "no" votes are possibilities for the intent of the missing validation, which means whatever assumption we make in its absence carries some risk of being wrong. A corollary is that any "fix" should focus on flapping less, because the goal of reflecting validators' true intentions is impossible to guarantee. The risk of being wrong seems to be smaller if we assume that the vote is unchanged from the validator's previous vote, which means one fix is to forward fill votes. That requires keeping track of old votes, which I don't think we do currently. (It could be done on-chain if each validator was associated with an account and we introduced the capability of recording votes, but I'm not suggesting that we do that.) These are the "sticky" votes @nbougalis is talking about. We update these votes with every pre-flag validation to keep them fresh. Another fix, the hysteresis @nbougalis suggested, will work to reduce flapping. It effectively forward fills "yes" votes (but not "no" votes) for one flag ledger and no more. It does not stop the flapping if the cause is a flaky validator voting "no"1. It becomes more likely to diverge from the validators' true intentions, though, if the outage lasts for more than one flag ledger. Footnotes
|
I've investigated an approach to amendment flapping that I think would address the comment from @mDuo13: #4364 (review) The new approach seems viable, so I've created a pull request for that approach: #4410 In cases like this I usually try to make a change that could be cherry-picked into the existing pull request. I didn't do so this time because there's very little in common between the two approaches. The only thing shared by the two pull requests is the name of the Thanks for your efforts on this pull request, @a-noni-mousse. Even though I've submitted an alternative approach I very much appreciate your efforts here. |
Removed myself as reviewer as I believe that the 4 common reviewers assigned to both this and the #4410 alternative are sufficient. |
We could treat absent/unknown votes as votes not to change the status quo. |
Closing in favor of #4410 |
This commit implements a new amendment that follows one of the recommendations in #4350 to reduce the flapping of amendments when there is validator outage and support drops below 80% required majority.
The new threshold for deactivation of the countdown timer of an an amendment is to go below 65% support.
High Level Overview of Change
Context of Change
Type of Change