Limit amendment flapping (fixes #4350) #4364

a-noni-mousse · 2022-12-05T09:32:27Z

This commit implements a new amendment that follows one of the recommendations in #4350 to reduce the flapping of amendments when there is validator outage and support drops below 80% required majority.

The new threshold for deactivation of the countdown timer of an an amendment is to go below 65% support.

High Level Overview of Change

Context of Change

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactor (non-breaking change that only restructures code)
Tests (You added tests for code that already exists, or your new feature included in this PR)
Documentation Updates
Release

nbougalis · 2022-12-06T18:16:02Z

Great, happy to assign this to you and delegate myself to be a reviewer.

nbougalis

I like the elegant and minimal solution. I think will work just fine to prevent casual flapping in case a validator goes offline at just the right moment.

It does not add any hysteresis and that may be something else to consider adding on top of this, but then again maybe this is sufficient on its own for all but the most extreme flapping cases and there's something to be said about keeping it simple.

src/ripple/app/misc/impl/AmendmentTable.cpp

nbougalis · 2022-12-06T18:49:08Z

src/ripple/app/misc/impl/AmendmentTable.cpp

-        else if (!hasValMajority && (majorityTime != NetClock::time_point{}))
+        else if (
+            !hasValMajority && (majorityTime != NetClock::time_point{}) &&
+            !vote->passes(entry.first, vote->deactivationThreshold()))


I think this is right and should have the same behavior as the existing code, but someone else should triple-check the logic when the amendment being checked is unknown. Here's the logic: hasValMajority will be false; before this commit, we'd always enter this block. Now vote->passes will return false, and the ! will turn it to true putting us in this block, just like before.

If the amendment is unknown, then how would majorityTime be non-zero? And if it is zero for unknown amendments, then how would this branch have been entered before the change?

nbougalis · 2022-12-06T18:49:47Z

src/ripple/protocol/Feature.h

@@ -74,7 +74,7 @@ namespace detail {
 // Feature.cpp. Because it's only used to reserve storage, and determine how
 // large to make the FeatureBitset, it MAY be larger. It MUST NOT be less than
 // the actual number of amendments. A LogicError on startup will verify this.
-static constexpr std::size_t numFeatures = 53;
+static constexpr std::size_t numFeatures = 54;


We should consider ways to remove this altogether. It's a bit of a workaround.

ok but not in this amendment i think to keep the code small

Does this need to be updated to 55? Was another amendment added while this was being reviewed? CI fails on this number being wrong, and the diff says this new line is no longer a difference.

nbougalis · 2022-12-06T18:51:22Z

src/test/app/AmendmentTable_test.cpp

+    //    void
+    //    doRound(
+    //        uint256 const& feat,
+    //        AmendmentTable& table,
+    //        weeks week,
+    //        std::vector<std::pair<PublicKey, SecretKey>> const& validators,
+    //        std::vector<std::pair<uint256, int>> const& votes,
+    //        std::vector<uint256>& ourVotes,
+    //        std::set<uint256>& enabled,
+    //        majorityAmendments_t& majority)
+    //    {
+    //        doRound(
+    //            {feat},
+    //            table,
+    //            week,
+    //            validators,
+    //            votes,
+    //            ourVotes,
+    //            enabled,
+    //            majority);
+    //    }
+


Should this commented-out block just be removed?

src/test/app/AmendmentTable_test.cpp

nbougalis

LGTM.

intelliot · 2022-12-21T19:39:36Z

@a-noni-mousse can you resolve the merge conflicts? thanks!

a-noni-mousse · 2022-12-23T06:08:18Z

@a-noni-mousse can you resolve the merge conflicts? thanks!

OK I have done the resolve

thejohnfreeman · 2023-01-03T17:52:07Z

I guess the downside of this approach is that if exactly one validator permanently withdraws support for an amendment that just barely has supermajority support, then it will not actually lose supermajority support until enough others withdraw their support too. I prefer the idea of tracking votes on-ledger, but I guess the community finds the risk of this approach acceptable.

thejohnfreeman

Might need to update the feature count again: #4364 (comment)

HowardHinnant · 2023-01-03T18:04:06Z

$ rippled --unittests
Logic error: More features defined than allocated. Adjust numFeatures in Feature.h.

nbougalis · 2023-01-04T21:27:56Z

My bad, with the close-reopen. Apologies.

mDuo13

I am against this approach.

It puts more demand on the validators to be stably online all the time. If you're one of the swing votes against an amendment, it could gain the majority while you're having a brief outage, and sustain the lower threshold even after you come back, which is not how the system is intended to work.

This changes the threshold for activating an amendment to "65% and some luck" rather than "sustained 80%". If you were to describe this as simply lowering the threshold to 65%, I think people would rightly be against that; but this is even worse because it's "kind of 65%" but more complicated.

The correct fix for amendment flapping would be to change the activation threshold calculation so that it requires >80% of all validators on the node's UNL (ignoring Negative UNL) rather than >80% of validators currently participating in consensus. Then amendments would only "flap" when validators change their votes, which is the intended behavior.

scottschurr · 2023-01-05T02:04:16Z

I'm with @mDuo13 on this one. I had been writing up some numeric concerns with this pull request. But in light of @mDuo13's comment those numeric issues are irrelevant. Let's not make it harder to run a validator than it already is.

thejohnfreeman · 2023-01-05T14:43:43Z

I'm convinced by @mDuo13 too, but I will comment that using 80% of UNL vs seen validations will not fix the issue either. You'll flap on when you see a yea from 80% the UNL and off when one of those yeas misses a validation.

The only way to tell the difference between "validator A is no longer voting yea" and "validator A appears to be offline" is if you see a validation from A with no yea. We're not observing that distinction in the code. If we start to carefully make that distinction, and a yea-voting validator is temporarily offline, should we pretend it is still voting yea? For how long?

scottschurr · 2023-01-05T17:05:59Z

@thejohnfreeman, excellent point and question.

My personal take, without trying any implementation, is that a validator should be assumed to be holding the same vote as it held before until a new vote actually arrives from that validator. So if the validator is offline it is assumed to not change its vote. However, that may be hard to implement, I'm not sure. I'm hesitant to armchair quarterback an approach that has the right balance of voting stability vs additional complexity.

nbougalis · 2023-01-06T18:17:08Z

@mDuo13, @thejohnfreeman: excellent points. I had not thought of that when reviewing the code. I'm thinking that this isn't ready for "prime time" as is.

nbougalis · 2023-01-07T05:11:31Z

To elaborate a bit: you can't track individual votes on ledger; whose votes would you track? You can't say "the UNL's" because there isn't one UNL or a requirement that my UNL completely overlaps with yours, and you'd need to reach consensus on which ones to include (and you can't include all of them!). It's for the same reason that you can't just track a "count" of yays or nays, or use the "size of the UNL" instead of the number of validations received.

I liked this 'dual threshold' option, but obviously it's problematic for the reasons described above. Given that, I think the best option here maybe the other option (which I think I originally proposed): adding some hysteresis to the "deactivation".

If an amendment has gained an 80% supermajority, but subsequently loses it at some round X, set a flag indicating that's the case. If it recovers the 80% supermajority, in round X+1, then all is well; simply remove the flag and assume that it never lost majority. If it is still below the threshold, then just assume the amendment lost majority.

The hysteresis can last for a single round or it can be extended to more.

This commit implements a new amendment that follows one of the recommendations in XRPLF#4350 to reduce the flapping of amendments when there is validator outage and support drops below 80% required majority. A new FlappingAmendment vector is used to track amendments that lose majority and add delay of one flag ledger.

a-noni-mousse · 2023-01-15T08:09:21Z

I have changed the code ot use hysteria instead @nbougalis @scottschurr @thejohnfreeman @mDuo13 by adding new FlappingAmendment vector and when amendment loses a majority if it is not on that list we add and do nothing else. If it is in the list then we do the old code and this causes the delay on one round to the next flag ledger for deactivation.

thejohnfreeman · 2023-01-25T19:44:11Z

src/ripple/app/misc/impl/AmendmentTable.cpp

+    // minimum number of votes needed to begin activation countdown
+    int activationThreshold_ = 0;
+
+    // minimum number of votes needed to continue activate countdown


Suggested change

// minimum number of votes needed to continue activate countdown

// minimum number of votes needed to continue activation countdown

thejohnfreeman · 2023-01-25T19:47:22Z

src/ripple/app/misc/impl/AmendmentTable.cpp

-        else if (!hasValMajority && (majorityTime != NetClock::time_point{}))
+        else if (
+            !hasValMajority && (majorityTime != NetClock::time_point{}) &&
+            !vote->passes(entry.first, vote->deactivationThreshold()))


If the amendment is unknown, then how would majorityTime be non-zero? And if it is zero for unknown amendments, then how would this branch have been entered before the change?

thejohnfreeman · 2023-01-25T19:58:28Z

src/ripple/app/tx/impl/Change.cpp

+    auto it = std::find_if(
+        majorities.begin(), majorities.end(), [&amendment](STObject const& o) {
+            return o[sfAmendment] == amendment;
+        });
+
+    if (it == majorities.end() && lostMajority)
+        return tefALREADY;
+
+    if (it != majorities.end())
    {
-        const STArray& oldMajorities =
-            amendmentObject->getFieldArray(sfMajorities);
-        for (auto const& majority : oldMajorities)
+        if (gotMajority)
+            return tefALREADY;
+
+        majorities.erase(it);
+    }


The first time an amendment loses majority, the program will reach line 251, remove the amendment from the sfMajorities array, and continue, presumably to add it to the sfFlappingAmendments array. The second time an amendment loses majority, the program will reach line 244 and return, never continuing to fix the sfFlappingAmendments array. Line 251 needs to be conditional on whether the amendment is in the sfFlappingAmendments array (which itself is conditional on the amendment).

There does not seem to be a new test for this new behavior. Have you added one?

scottschurr · 2023-01-25T20:11:58Z

@a-noni-mousse, I glanced over this pull request and, although it follows the suggestion by @nbougalis, I'm not completely sold on the approach. I'm investigating whether what @mDuo13 suggested is feasible: #4364 (review) If that approach is feasible, then I think that would be the better path.

So I'm not ignoring you. Your pull request has prompted further investigation. Thanks for your patience.

nbougalis · 2023-01-27T16:00:06Z

@a-noni-mousse, I glanced over this pull request and, although it follows the suggestion by @nbougalis, I'm not completely sold on the approach. I'm investigating whether what @mDuo13 suggested is feasible: #4364 (review) If that approach is feasible, then I think that would be the better path.

I'm not sure I understand how what @mDuo13 is suggesting would work. He writes:

The correct fix for amendment flapping would be to change the activation threshold calculation so that it requires >80% of all validators on the node's UNL (ignoring Negative UNL) rather than >80% of validators currently participating in consensus. Then amendments would only "flap" when validators change their votes, which is the intended behavior.

How will this prevent flapping or does it constitute a "correct fix"? Remember, votes aren't persistent which means that an amendment can flap on and off if as few as 1 validator were to miss a single vote.

The problem just isn't the negative UNL; flapping was a problem before that. The problem isn't even that the threshold is calculated based on the number of trusted validators that participated in the vote; if the threshold was calculated against the size of a node's UNL, you'd actually have more flapping. See here.

The problem is that there's a very narrow window of voting and a validator needs to be active during that window for it to cast a "yes" vote.

Now, this can be solved in one of several ways:

Make validator votes "sticky." That is, if a validator voted "yes" once, then that "yes" persists until the validator votes "no". That has several problems, some of which I've explained before. At a minimum, the votes need to be persisted. You can't persist them on the ledger itself, which means you need to persist them locally. But if you do that, you need a mechanism to check if the vote you've persisted is stale, and if you don't, you're calculating the threshold incorrectly.
Add hysteresis, which the latest commit from OP does. This doesn't entirely prevent flapping, but it effectively 'extends' a grace period by "persisting" the majority (as opposed to persisting the vote) until the next ballot.
Allow a validator to vote at any time prior to the tally. This is interesting but still won't entirely prevent flapping either.

thejohnfreeman · 2023-01-27T19:33:21Z

Here's an elaboration of my previous comment that I wrote in a conversation with @scottschurr:

If we switch to using the full UNL, and ignore the negative UNL (@mDuo13's suggestion), then the problem becomes that a missing validation is interpreted as a "no" vote. Both "yes" and "no" votes are possibilities for the intent of the missing validation, which means whatever assumption we make in its absence carries some risk of being wrong. A corollary is that any "fix" should focus on flapping less, because the goal of reflecting validators' true intentions is impossible to guarantee.

The risk of being wrong seems to be smaller if we assume that the vote is unchanged from the validator's previous vote, which means one fix is to forward fill votes. That requires keeping track of old votes, which I don't think we do currently. (It could be done on-chain if each validator was associated with an account and we introduced the capability of recording votes, but I'm not suggesting that we do that.) These are the "sticky" votes @nbougalis is talking about. We update these votes with every pre-flag validation to keep them fresh.

Another fix, the hysteresis @nbougalis suggested, will work to reduce flapping. It effectively forward fills "yes" votes (but not "no" votes) for one flag ledger and no more. It does not stop the flapping if the cause is a flaky validator voting "no"¹. It becomes more likely to diverge from the validators' true intentions, though, if the outage lasts for more than one flag ledger.

For example, assume you have 27 of 34 validators voting "yes", and one of the "no" voters disappears. 27 of 33 validators reaches majority. Then the "no" voter returns, and majority is lost two flag ledgers later. Flapping was not stopped in this situation. ↩

scottschurr · 2023-02-03T20:05:42Z

I've investigated an approach to amendment flapping that I think would address the comment from @mDuo13: #4364 (review)

The new approach seems viable, so I've created a pull request for that approach: #4410

In cases like this I usually try to make a change that could be cherry-picked into the existing pull request. I didn't do so this time because there's very little in common between the two approaches. The only thing shared by the two pull requests is the name of the Feature. We'll let reviewers choose which approach they prefer.

Thanks for your efforts on this pull request, @a-noni-mousse. Even though I've submitted an alternative approach I very much appreciate your efforts here.

HowardHinnant · 2023-03-20T20:45:45Z

Removed myself as reviewer as I believe that the 4 common reviewers assigned to both this and the #4410 alternative are sufficient.

JoelKatz · 2023-05-25T20:35:29Z

We could treat absent/unknown votes as votes not to change the status quo.

intelliot · 2023-10-20T18:32:41Z

Closing in favor of #4410

nbougalis requested review from nbougalis, RichardAH and thejohnfreeman December 6, 2022 18:16

nbougalis linked an issue Dec 6, 2022 that may be closed by this pull request

Prevent amendment majority "flapping" (Version: 1.9.4) #4350

Closed

nbougalis mentioned this pull request Dec 6, 2022

Prevent amendment majority "flapping" (Version: 1.9.4) #4350

Closed

nbougalis suggested changes Dec 6, 2022

View reviewed changes

a-noni-mousse force-pushed the fix4350 branch from c2b2686 to 91ec2fc Compare December 8, 2022 21:34

intelliot added Amendment Testable labels Dec 9, 2022

nbougalis approved these changes Dec 9, 2022

View reviewed changes

intelliot requested review from scottschurr, gregtatcam and HowardHinnant December 19, 2022 22:42

intelliot added High Priority Will Need Documentation labels Jan 3, 2023

intelliot requested a review from mDuo13 January 3, 2023 17:25

thejohnfreeman approved these changes Jan 3, 2023

View reviewed changes

thejohnfreeman requested changes Jan 3, 2023

View reviewed changes

nbougalis closed this Jan 4, 2023

nbougalis reopened this Jan 4, 2023

mDuo13 requested changes Jan 5, 2023

View reviewed changes

a-noni-mousse force-pushed the fix4350 branch from 4f00ce5 to d264077 Compare January 15, 2023 08:04

a-noni-mousse force-pushed the fix4350 branch from d264077 to 931a369 Compare January 15, 2023 08:08

intelliot requested a review from thejohnfreeman January 18, 2023 19:27

intelliot assigned thejohnfreeman and scottschurr Jan 18, 2023

thejohnfreeman requested changes Jan 25, 2023

View reviewed changes

scottschurr mentioned this pull request Feb 3, 2023

Fix amendment majority flapping: use a more stable threshold for the number of votes required; when missing STValidation, use the last vote seen #4410

Merged

3 tasks

intelliot removed High Priority labels Feb 7, 2023

HowardHinnant removed their request for review March 20, 2023 20:44

intelliot requested review from ChronusZ and removed request for RichardAH April 26, 2023 16:30

intelliot closed this Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit amendment flapping (fixes #4350) #4364

Limit amendment flapping (fixes #4350) #4364

a-noni-mousse commented Dec 5, 2022 •

edited by intelliot

Loading

nbougalis commented Dec 6, 2022

nbougalis left a comment

nbougalis Dec 6, 2022

thejohnfreeman Jan 25, 2023

nbougalis Dec 6, 2022

a-noni-mousse Dec 8, 2022

thejohnfreeman Jan 3, 2023 •

edited

Loading

nbougalis Dec 6, 2022

a-noni-mousse Dec 8, 2022

nbougalis left a comment

intelliot commented Dec 21, 2022

a-noni-mousse commented Dec 23, 2022

thejohnfreeman commented Jan 3, 2023

thejohnfreeman left a comment

HowardHinnant commented Jan 3, 2023

nbougalis commented Jan 4, 2023

mDuo13 left a comment

scottschurr commented Jan 5, 2023

thejohnfreeman commented Jan 5, 2023

scottschurr commented Jan 5, 2023

nbougalis commented Jan 6, 2023

nbougalis commented Jan 7, 2023 •

edited

Loading

a-noni-mousse commented Jan 15, 2023

thejohnfreeman Jan 25, 2023

thejohnfreeman Jan 25, 2023

thejohnfreeman Jan 25, 2023

scottschurr commented Jan 25, 2023

nbougalis commented Jan 27, 2023

thejohnfreeman commented Jan 27, 2023

scottschurr commented Feb 3, 2023

HowardHinnant commented Mar 20, 2023

JoelKatz commented May 25, 2023

intelliot commented Oct 20, 2023

	// minimum number of votes needed to continue activate countdown
	// minimum number of votes needed to continue activation countdown

Limit amendment flapping (fixes #4350) #4364

Limit amendment flapping (fixes #4350) #4364

Conversation

a-noni-mousse commented Dec 5, 2022 • edited by intelliot Loading

High Level Overview of Change

Context of Change

Type of Change

nbougalis commented Dec 6, 2022

nbougalis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thejohnfreeman Jan 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nbougalis left a comment

Choose a reason for hiding this comment

intelliot commented Dec 21, 2022

a-noni-mousse commented Dec 23, 2022

thejohnfreeman commented Jan 3, 2023

thejohnfreeman left a comment

Choose a reason for hiding this comment

HowardHinnant commented Jan 3, 2023

nbougalis commented Jan 4, 2023

mDuo13 left a comment

Choose a reason for hiding this comment

scottschurr commented Jan 5, 2023

thejohnfreeman commented Jan 5, 2023

scottschurr commented Jan 5, 2023

nbougalis commented Jan 6, 2023

nbougalis commented Jan 7, 2023 • edited Loading

a-noni-mousse commented Jan 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottschurr commented Jan 25, 2023

nbougalis commented Jan 27, 2023

thejohnfreeman commented Jan 27, 2023

Footnotes

scottschurr commented Feb 3, 2023

HowardHinnant commented Mar 20, 2023

JoelKatz commented May 25, 2023

intelliot commented Oct 20, 2023

a-noni-mousse commented Dec 5, 2022 •

edited by intelliot

Loading

thejohnfreeman Jan 3, 2023 •

edited

Loading

nbougalis commented Jan 7, 2023 •

edited

Loading