storage: *roachpb.RaftGroupDeletedError for RHS in splitPostApply #21146

tbg · 2018-01-01T16:54:51Z

func splitPostApply(
	ctx context.Context, deltaMS enginepb.MVCCStats, split *roachpb.SplitTrigger, r *Replica,
) {
	// The right hand side of the split was already created (and its raftMu
	// acquired) in Replica.acquireSplitLock. It must be present here.
	rightRng, err := r.store.GetReplica(split.RightDesc.RangeID)
	if err != nil {
		log.Fatalf(ctx, "unable to find RHS replica: %s", err)
	}
	{
		rightRng.mu.Lock()
		// Already holding raftMu, see above.
		err := rightRng.initRaftMuLockedReplicaMuLocked(&split.RightDesc, r.store.Clock(), 0)
		rightRng.mu.Unlock()
		if err != nil {
			log.Fatal(ctx, err) <--
		}
	}

https://sentry.io/cockroach-labs/cockroachdb/issues/426530382/

*log.safeError: store.go:1820: *roachpb.RaftGroupDeletedError
  File "github.com/cockroachdb/cockroach/pkg/storage/replica_proposal.go", line 718, in handleReplicatedEvalResult
  File "github.com/cockroachdb/cockroach/pkg/storage/replica_proposal.go", line 981, in handleEvalResultRaftMuLocked
  File "github.com/cockroachdb/cockroach/pkg/storage/replica.go", line 4480, in processRaftCommand
  File "github.com/cockroachdb/cockroach/pkg/storage/replica.go", line 3481, in handleRaftReadyRaftMuLocked
  File "github.com/cockroachdb/cockroach/pkg/storage/replica.go", line 3173, in handleRaftReady
...
(4 additional frame(s) were not displayed)

store.go:1820: *roachpb.RaftGroupDeletedError

The text was updated successfully, but these errors were encountered:

tbg · 2018-01-02T15:37:37Z

The scenario here should be something like

range splits, with one follower lagging behind
the RHS of the split decides to drop the lagging replica from the replica set
when the split executes on the (yet unsplit) range, it blows up like seen above.

In the error, the split lock has previously been acquired. This implies that getOrCreateReplica was previously called, successfully. In tracing the code, I noticed that we pass a replicaID of zero into getOrCreateReplica while acquiring the split lock. This is a problem, as this is roughly the underlying code for that call:

func (s *Store) tryGetOrCreateReplica(
	ctx context.Context,
	rangeID roachpb.RangeID,
	replicaID roachpb.ReplicaID,
	creatingReplica *roachpb.ReplicaDescriptor,
) (_ *Replica, created bool, _ error) {
	// The common case: look up an existing (initialized) replica.
	if value, ok := s.mu.replicas.Load(int64(rangeID)); ok {
		// ...
		repl.setReplicaIDRaftMuLockedMuLocked(replicaID)
		// ...
		return
	}

	// No replica currently exists, so we'll try to create one. Before creating
	// the replica, see if there is a tombstone which would indicate that this is
	// a stale message.
	tombstoneKey := keys.RaftTombstoneKey(rangeID)
	var tombstone roachpb.RaftTombstone
	if ok, err := engine.MVCCGetProto(
		ctx, s.Engine(), tombstoneKey, hlc.Timestamp{}, true, nil, &tombstone,
	); err != nil {
		return nil, false, err
	} else if ok {
		if replicaID != 0 && replicaID < tombstone.NextReplicaID {
			return nil, false, &roachpb.RaftGroupDeletedError{}
		}
	}

so we won't be checking tombstones in that case (fourth line from the bottom). Similarly repl.setReplicaIDRaftMuLockedMuLocked(replicaID) also ignores the tombstone.

So it's possible to acquire the splitLock for a replica that is in fact absent and can't be recreated with the split descriptor's replicaID. This is because we're abusing the preemptive snapshot path.

Then, in splitPostApply, we call initRaftMuLockedReplicaMuLocked (with a zero replicaID too, but this time, this is the contract of that method when initializing a replica, which is what we do in that case), and it does the proper tombstone checks, and blows up because it looks up the proper replicaID from the descriptor, and calls setReplicaIDRaftMuLockedMuLocked with it:

if replicaID < r.mu.minReplicaID {
		return &roachpb.RaftGroupDeletedError{}
}

I think the right thing to do is to pass the proper replicaID when acquiring the split lock. If that succeeds, we're guaranteed that splitPostApply won't run into the same problem as the tombstone check has been passed before, and we've been holding raftMu continuously.

That shifts the possible occurrence of the error to the acquisition of the split lock. What to do in that case? If we get *RaftGroupDeletedError, and we've passed the correct replicaID, the split is moot. We should "apply" it successfully, but not actually create the RHS. Instead, we shorten only the LHS. This is going to be a little annoying to plumb, but shouldn't be intrinsically complicated.

We could also leave the error to bubble up at the end, but interpret it in splitPostApply. This might actually be easier.

@bdarnell I won't get to this any time soon, but it does seem important to fix.

bdarnell · 2018-01-04T19:13:27Z

We should "apply" it successfully, but not actually create the RHS. Instead, we shorten only the LHS. This is going to be a little annoying to plumb, but shouldn't be intrinsically complicated.

The complicated part of this is going to be in shortening the LHS. If we can't create the RHS replica, we can't send it through the replica GC queue. We'd need a new code path to destroy the on-disk data that's left over here.

We could also leave the error to bubble up at the end, but interpret it in splitPostApply. This might actually be easier.

Yeah, all that plumbing may not be necessary.

tbg · 2018-01-05T19:44:57Z

The complicated part of this is going to be in shortening the LHS. If we can't create the RHS replica, we can't send it through the replica GC queue. We'd need a new code path to destroy the on-disk data that's left over here.

In this special case, since we know that the on-disk data is there, can we force-create the replica with its old replicaID (i.e., allow a recreation), and then let the replica GC queue destroy it?

But, looking into this further, it seems that the existence of the tombstone in itself is a bug, this is from removeReplicaImpl:

	if placeholder := s.getOverlappingKeyRangeLocked(desc); placeholder != rep {
		// This is a fatal error because uninitialized replicas shouldn't make it
		// this far. This method will need some changes when we introduce GC of
		// uninitialized replicas.
		s.mu.Unlock()
		log.Fatalf(ctx, "replica %+v unexpectedly overlapped by %+v", rep, placeholder)
	}

(and that Fatal should fire, since placeholder will never be uninitialized, and since we couldn't insert a placeholder for the RHS since it overlaps the LHS, whatever will be returned can't be rep;
I think it'll be the LHS replica).

And it seems like we actually never queue uninitialized replicas from Raft handling:

func (s *Store) HandleRaftResponse(ctx context.Context, resp *RaftMessageResponse) error {
	ctx = s.AnnotateCtx(ctx)
	repl, replErr := s.GetReplica(resp.RangeID)
	if replErr == nil {
		// Best-effort context annotation of replica.
		ctx = repl.AnnotateCtx(ctx)
	}
	switch val := resp.Union.GetValue().(type) {
	case *roachpb.Error:
		switch tErr := val.GetDetail().(type) {
		case *roachpb.ReplicaTooOldError:
			if replErr != nil {
				// RangeNotFoundErrors are expected here; nothing else is.
				if _, ok := replErr.(*roachpb.RangeNotFoundError); !ok {
					log.Error(ctx, replErr)
				}
				return nil
			}
			// ... now actually replicaGCQueue.Add()

Perhaps one of the other callers to Add is less careful, for example this one:

func (s *Store) canApplySnapshotLocked(
	ctx context.Context, rangeDescriptor *roachpb.RangeDescriptor,
) (*ReplicaPlaceholder, error) {
	if v, ok := s.mu.replicas.Load(int64(rangeDescriptor.RangeID)); ok &&
		(*Replica)(v).IsInitialized() {
		// We have the range and it's initialized, so let the snapshot through.
		return nil, nil
	}

	// We don't have the range (or we have an uninitialized
	// placeholder). Will we be able to create/initialize it?
	if exRng, ok := s.mu.replicaPlaceholders[rangeDescriptor.RangeID]; ok {
		return nil, errors.Errorf("%s: canApplySnapshotLocked: cannot add placeholder, have an existing placeholder %s", s, exRng)
	}

	if exRange := s.getOverlappingKeyRangeLocked(rangeDescriptor); exRange != nil {
		// We have a conflicting range, so we must block the snapshot.
		// When such a conflict exists, it will be resolved by one range
		// either being split or garbage collected.
		exReplica, err := s.GetReplica(exRange.Desc().RangeID)
		msg := IntersectingSnapshotMsg
		if err != nil {
			log.Warning(ctx, errors.Wrapf(
				err, "unable to look up overlapping replica on %s", exReplica))
		} else {
			inactive := func(r *Replica) bool {
				if r.RaftStatus() == nil {
					return true
				}
				lease, pendingLease := r.GetLease()
				now := s.Clock().Now()
				return !r.IsLeaseValid(lease, now) &&
					(pendingLease == nil || !r.IsLeaseValid(*pendingLease, now))
			}

			// If the existing range shows no signs of recent activity, give it a GC
			// run.
			if inactive(exReplica) {
				if _, err := s.replicaGCQueue.Add(exReplica, replicaGCPriorityCandidate); err != nil {
					log.Errorf(ctx, "%s: unable to add replica to GC queue: %s", exReplica, err)
				} else {
					msg += "; initiated GC:"
				}
			}

I don't see how an unitialized replica will do in the above code path because it shouldn't ever overlap an incoming snapshot (it's not in replicasByKey as it doesn't have a descriptor yet). But somehow a tombstone must have been written.

There are two more callers to .Add, one GC's existing on-disk replicas on startup (which I think I can't blame here) and one in which a replica queues itself when it applies a replica change that removes it. I can't blame that one either as the uninitialized replica won't apply anything.

I'm left wondering how we wrote a tombstone. Note that we don't actually have to "write a tombstone" for the original error to pop up. We need minReplicaID to be bumped. This happens in ~5 locations, but I looked at them one by one and they seem to be guarded appropriately.

bdarnell · 2018-01-08T03:22:15Z

In this special case, since we know that the on-disk data is there, can we force-create the replica with its old replicaID (i.e., allow a recreation), and then let the replica GC queue destroy it?

Yes, we probably could, but that's part of the complexity I was thinking about. Do we just special-case it in memory so we have a Replica object that is forbidden by the on-disk tombstone, or do we delete the tombstone as a part of processing the split so we can recreate the replica (and then GC it and recreate the tombstone)? (Now that I've written this out, the latter sounds like a terrible idea)

But hopefully we can forget about that and fix this by preventing the tombstone from being written in the first place.

Even if canApplySnapshotLocked calls replicaGCQueue.Add with an uninitialized replica, baseQueue.addInternal does another IsInitialized check, so we should never get into a queue.process with an uninitialized replica. Is there any other way for a tombstone to be written? The only ones I see are the replicaGCQueue and MergeRange (I guess it's possible that someone is playing with merging and triggered a sentry report, but it seems unlikely)

Note that we don't actually have to "write a tombstone" for the original error to pop up. We need minReplicaID to be bumped.

This seems plausible. We bump minReplicaID whenever we set our replica ID to a new value:

cockroach/pkg/storage/replica.go

Lines 953 to 955 in 4051ba3

    
           if replicaID >= r.mu.minReplicaID { 
        
           	r.mu.minReplicaID = replicaID + 1 
        
           }

~~If replicaID ever moves backwards, we get this message.~~ (update: was just looking at the one line without its surrounding if) We could almost have something like this:

Range r1 splits (creating r2) while node n3 is down or behind.
r2 is rebalanced from n3 (replica ID 3) to n4 (replica ID 4)
r2 is rebalanced back from n4 to n3 (replica ID 5). The problem with this is that n3 won't be able to process a preemptive snapshot for r2, and I don't think we can assign a new replica ID without a preemptive snapshot. Or is there some path I'm missing here?
n3 catches up with r1's raft log and processes the split. We can acquire the split lock on r2 because it asks for replica ID 0 (unchecked), but we get this error in SplitRange.

bdarnell · 2018-02-14T21:53:47Z

The problem with this is that n3 won't be able to process a preemptive snapshot for r2, and I don't think we can assign a new replica ID without a preemptive snapshot. Or is there some path I'm missing here?

Paging this issue back in, I'm not sure what I was thinking here - we set replica ids in response to all raft messages, not just preemptive snapshots. Store.withReplicaForRequest will create an uninitialized Replica with the new replica ID. That method can't complete while the split lock (RHS raftMu) is held, but it works if you flip the order.

Range r1 splits (creating r2) while node n3 is down or behind.
r2 is rebalanced from n3 (replica ID 3) to n4 (replica ID 4).
r2 is rebalanced back from n4 to n3 (replica ID 5) while still down or behind.
n3 catches back up. First, it receives raft messages for r2 containing its new replica ID 5, so it creates the Replica object with that ID (in memory only; this replica ID is not persisted and we do not write a tombstone)
Next, processing the raft log of r1, it reaches the split. acquireSplitLock locks r2's raftMu but does not check the replica ID because we passed 0.
In splitPostApply, we finally check the raft log and it fails.
After process restart, the knowledge of replica ID 5 is gone so the node can come back up as if this race never happened.

So this is a single crash that does not recur (we've seen it occur on three different clusters, and none of them experienced a second error). Does it matter that we forgot our former replica ID? I don't think it does, because the only thing we can do in our uninitialized state is vote, and the votes are correctly persisted in the HardState.

I think we can fix this by allowing the replica ID to move backwards in some cases. minReplicaID should only be set from the tombstone and not for transient replica ID bumps. Alternately, we could change the uninitialized raft-message path to avoid setting the replica ID.

Either way, I'm downgrading the priority of this since it is infrequent and doesn't cause persistent problems.

tbg · 2018-10-15T07:43:05Z

Reproduced in a 32node import roachtest: #31184 (comment)

tbg · 2019-03-01T12:31:40Z

Have been looking at this again due to unrelated reasons (investigating the replicaid refactor) and am wondering about minReplicaID. In this issue, the restart "fixed" things because it allowed minReplicaID to regress. But what would happen if after the restart, stale messages to this replica got delivered to it? They would now be processed by the replica. This seems sketchy, though I'm not sure why exactly it would matter (i.e. I can't see any concrete harm that could be done this way). Conceptually we'd want the property that messages to different Raft peers (= different replicaIDs) don't intermingle.

tbg · 2019-03-01T12:33:18Z

(but that property is out the window already, as we share the Raft log and HardState so conceptually it's really more as if we reused the previous replicaID)

tbg · 2019-03-01T12:36:20Z

Taking this a step further, I wonder why we can't always mandate that replicaID == storeID. That is, a replica will only be added to store 5 with replicaID 5. If it then got removed and added again, we'd use replicaID 5 again. GC would be made more awkward perhaps, since there's no good way to tell whether a replica is stale or not (i.e. stray messages would more easily recreate removed replicas, and they would do so in dataless state, where we don't even have replicaGC). But this seems like a problem we can solve somehow.

bdarnell · 2019-03-01T16:26:17Z

The whole reason for replicaID's existence is that etcd/raft doesn't handle reuse of node/replica ids. Or more specifically, it assumes that a node ID will never regress in certain ways, even across remove/add cycles. Raft may panic if it sees that replica 5 has acknowledged log position 123 and later asks for a snapshot starting before that point. Inconsistency could result if replica 5 casts different votes in the same term.

Maybe there are alternative solutions here that would be less error-prone than changing replicaIDs, though. We already have permanent tombstones; it wouldn't cost much more to keep the HardState around too. And preemptive snapshots might resolve the panic issue (although learner replicas wouldn't, unless we make a larger series of changes to raft to make it less panicky overall).

tbg · 2019-03-07T11:56:17Z

more specifically, it assumes that a node ID will never regress in certain ways,

Is this really the reason? I think when we apply a node removal we always nuke the progress for the peer:

https://github.com/cockroachdb/vendored/blob/3f5e5955a4eeffd4c82010e2c249893d94976ffc/go.etcd.io/etcd/raft/raft.go#L1436-L1438

and the removal is processed when the command is applied. I suppose theoretically the command could apply on the Raft leader well after it commits, and another leader could step up and re-add the node which would then contact the old leader and trigger a panic.

But now I had another thought: why is the Raft group keyed on the replicaID in the first place? Assuming we keep everything as is, shouldn't we be able to key the (internal) raft group by storeID? There wouldn't be a change to the external interface, but we'd be freed from the burden of recreating the raft group every time the replicaID changes. We'd still respect deletion tombstones and would tag our replicaID into outgoing Raft messages, so this should not cause any change in functionality and no migration is needed.
We'd have to rewrite from replicaID to nodeID before feeding ConfChanges into Raft:

cockroach/pkg/storage/replica_raft.go

Line 841 in ffe2d1d

raftGroup.ApplyConfChange(cc)

The upshot is that debugging gets easier (since peer id == storeID) and we can potentially reduce some of the locking around the raft group, since its lifetime will be that of the surrounding Replica (mod preemptive snapshot shenanigans).

Am I missing some reason for which this won't work?

BTW, re-reading the old RFCs further, the real motivation for replicaIDs seems to have been needing replica tombstones:

cockroach/docs/RFCS/20150729_replica_tombstone.md

Lines 35 to 47 in 31b2d57

    
           1. A range R has replicas on nodes A, B, and C; C is down. 
        
           2. Nodes A and B execute a `ChangeReplicas` transactions to remove 
        
              node C. Several more `ChangeReplicas` transactions follow, adding 
        
              nodes D, E, and F and removing A and B. 
        
           3. Nodes A and B garbage-collect their copies of the range. 
        
           4. Node C comes back up. When it doesn't hear from the lease holder of 
        
              range R, it starts an election. 
        
           5. Nodes A and B see that node C has a more advanced log position for 
        
              range R than they do (since they have nothing), so they vote for it. 
        
              C becomes lease holder and sends snapshots to A and B. 
        
           6. There are now two "live" versions of the range. Clients 
        
              (`DistSenders`) whose range descriptor cache is out of date may 
        
              talk to the ABC group instead of the correct DEF group.

That we went ahead and used the replicaID as the peer ID is sensible, but may have been a bad idea.

bdarnell · 2019-03-11T16:28:37Z

Is this really the reason? I think when we apply a node removal we always nuke the progress for the peer:

It's more about votes than log progress. If a node is removed and re-added with the same node id within a single term, it could cast a different vote in that term and elect a second leader, leading to split-brain. That's why I said we'd at least need to keep the HardState around indefinitely. I haven't thought through the log ack issues so I don't know whether that would be enough.

BTW, re-reading the old RFCs further, the real motivation for replicaIDs seems to have been needing replica tombstones:

This was concurrent with the discovery that we can't reuse replica IDs for the above split-brain reason.

Currently, in-memory Replica objects can end up having a replicaID zero. Roughly speaking, this is always the case when a Replica's range descriptor does not contain the Replica's store, though sometimes we do have a replicaID taken from incoming Raft messages (which then won't survive across a restart). We end up in this unnatural state mostly due to preemptive snapshots, which are a snapshot of the Range state before adding a certain replica, sent to the store that will house that replica once the configuration change to add it has completed. The range descriptor in the snapshot cannot yet assign the Replica a proper replicaID because none has been allocated yet (and this allocation has to be done in the replica change transaction, which hasn't started yet). Even when the configuration change completes and the leader starts "catching up" the preemptive snapshot and informs it of the replicaID, it will take a few moments until the Replica catches up to the log entry that actually updates the descriptor. If the node reboots before that log entry is durably applied, the replicaID will "restart" at zero until the leader contacts the Replica again. This suggests that preemptive snapshots introduce fundamental complexity which we'd like to avoid - as long as we use preemptive snapshots there will not be sanity in this department. This PR introduces a mechanism which delays the application of preemptive snapshots so that we apply them only when the first request *after* the completed configuration change comes in (at which point a replicaID is present). Superficially, this seems to solve the above problem (since the Replica will only be instantiated the moment a replicaID is known), though it doesn't do so across restarts. However, if we synchronously persisted (not done in this PR) the replicaID from incoming Raft messages whenever it changed, it seems that we should always be able to assign a replicaID when creating a Replica, even when dealing with descriptors that don't contain the replica itself (since there would've been a Raft message with a replicaID at some point, and we persist that). This roughly corresponds to persisting `Replica.lastToReplica`. We ultimately want to switch to learner replicas instead of preemptive snapshots. Learner replicas have the advantage that they are always represented in the replica descriptor, and so the snapshot that initializes them will be a proper Raft snapshot containing a descriptor containing the learner Replica itself. However, it's clear that we need to continue supporting preemptive snapshots in 19.2 due to the need to support mixed 19.1/19.2 clusters. This PR in conjunction with persisting the replicaID (and auxiliary work, for example on the split lock which currently also creates a replica with replicaID zero and which we know [has bugs]) should allow us to remove replicaID zero from the code base without waiting out the 19.1 release cycle. [has bugs]: cockroachdb#21146 Release note: None

tbg · 2019-08-12T11:45:29Z

minReplicaID is set in three places (as of the time of writing):

setTombstoneKey (as you'd expect)
when initializing a replica from on-disk state
preemptive snaps - irrelevant since it is soon dead code

when the replicaID changes:

cockroach/pkg/storage/replica_init.go

Lines 222 to 227 in 7dcf66d

    
           previousReplicaID := r.mu.replicaID 
        
           r.mu.replicaID = replicaID 
        
           if replicaID >= r.mu.minReplicaID { 
        
           	r.mu.minReplicaID = replicaID + 1 
        
           }

This last code is the one that matters here (though in principle, if we had a way to replicaGC uninit'ed replicas, we'd also get honest tombstones written).

First of all, I was surprised by the +1, I thought it would just set it to the new replicaID. But really this field is more of a nextReplicaID: the field is checked only when the replicaID changes, and then we check that it's >= r.mu.minReplicaID. So the +1 is correct:

cockroach/pkg/storage/replica_init.go

Lines 169 to 181 in 7dcf66d

    
           if r.mu.replicaID == replicaID { 
        
           	// The common case: the replica ID is unchanged. 
        
           	return nil 
        
           } 
        
           if replicaID == 0 { 
        
           	// If the incoming message does not have a new replica ID it is a 
        
           	// preemptive snapshot. We'll update minReplicaID if the snapshot is 
        
           	// accepted. 
        
           	return nil 
        
           } 
        
           if replicaID < r.mu.minReplicaID { 
        
           	return &roachpb.RaftGroupDeletedError{} 
        
           }

Anyway, what the split trigger really does in these scenarios is say "well, I have data here for this replica and even if the replica got removed and re-added any number of times, I'm still going to stick this old data in because I know that's safe". The safety comes from knowing that at no point could the replica have held data (other than the HardState, which we preserve). Or, in other words, it's safe to apply snapshots from past incarnations of the replica to a newer one. We just want to bypass the tombstone here.

bdarnell · 2019-08-12T14:45:38Z

The safety comes from knowing that at no point could the replica have held data (other than the HardState, which we preserve)

Just to complete the thought: we know this because the yet-to-be-split LHS is in the way. This clearly applies to the KV data; I had to think for a minute to convince myself that it works for logs. (The reason is that until we've processed the split, we're at log index zero. We need a snapshot before we can accept any logs, so the LHS blocking the snapshots also blocks any logs.

So bypassing the tombstone check when processing a split sounds good to me.

The right hand side of a split can be readded before the split trigger fires, in which case the split trigger fails. See [bug description]. I [suggested] a test to reprduce this bug "properly", so we should look into that. In the meantime, it'll be good to see that this passes tests. I verified manually that setting `minReplicaID` to some large number before the call to `rightRng.initRaftMuLockedReplicaMuLocked` reproduces the symptoms prior to this commit, but that doesn't come as a surprise nor does it prove that the fix works flawlessly. [bug description]: cockroachdb#21146 (comment) [suggested]: cockroachdb#39034 (comment) Fixes cockroachdb#21146. Release note (bug fix): Fixed a rare panic (message: "raft group deleted") that could occur during splits.

tbg · 2019-08-12T16:31:40Z

Ack, see #39571

39571: storage: avoid RaftGroupDeletedError from RHS in splitTrigger r=bdarnell a=tbg The right hand side of a split can be readded before the split trigger fires, in which case the split trigger fails. See [bug description]. I [suggested] a test to reprduce this bug "properly", so we should look into that. In the meantime, it'll be good to see that this passes tests. I verified manually that setting `minReplicaID` to some large number before the call to `rightRng.initRaftMuLockedReplicaMuLocked` reproduces the symptoms prior to this commit, but that doesn't come as a surprise nor does it prove that the fix works flawlessly. [bug description]: #21146 (comment) [suggested]: #39034 (comment) Fixes #21146. Release note (bug fix): Fixed a rare panic (message: "raft group deleted") that could occur during splits. Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>

This verifies the behavior of when the application of some split command (part of the lhs's log) is delayed on some store and meanwhile the rhs has rebalanced away and back, ending up with a larger ReplicaID than the split thinks it will have. Release note: None

39694: storage: add regression test for #21146 r=tbg a=danhhz This verifies the behavior of when the application of some split command (part of the lhs's log) is delayed on some store and meanwhile the rhs has rebalanced away and back, ending up with a larger ReplicaID than the split thinks it will have. Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>

tbg self-assigned this Jan 1, 2018

tbg added high priority C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting labels Jan 2, 2018

tbg assigned bdarnell Jan 2, 2018

nvanbenschoten mentioned this issue Jan 11, 2018

storage: splitPostApply fails with *roachpb.RaftGroupDeletedError #16641

Closed

bdarnell mentioned this issue Feb 13, 2018

*log.safeError: store.go:1820: *roachpb.RaftGroupDeletedError #22596

Closed

bdarnell added this to the 2.0 milestone Feb 13, 2018

bdarnell removed the high priority label Feb 14, 2018

bdarnell modified the milestones: 2.0, 2.1 Feb 14, 2018

tbg mentioned this issue Apr 1, 2018

RFCs: add range merges RFC #24394

Merged

tbg added S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. and removed S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting labels Apr 29, 2018

tbg added the A-kv-replication Relating to Raft, consensus, and coordination. label May 15, 2018

bdarnell modified the milestones: 2.1, 2.2 Aug 15, 2018

tbg mentioned this issue Oct 15, 2018

roachtest: import/tpch/nodes=32 failed #31184

Closed

tbg mentioned this issue Apr 8, 2019

storage: delay application of preemptive snapshots #35786

Closed

bdarnell removed their assignment Jul 31, 2019

tbg mentioned this issue Aug 12, 2019

storage: use learner replicas for replica addition by default #39034

Merged

tbg mentioned this issue Aug 12, 2019

storage: avoid RaftGroupDeletedError from RHS in splitTrigger #39571

Merged

craig bot closed this as completed in #39571 Aug 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: *roachpb.RaftGroupDeletedError for RHS in splitPostApply #21146

storage: *roachpb.RaftGroupDeletedError for RHS in splitPostApply #21146

tbg commented Jan 1, 2018

tbg commented Jan 2, 2018

bdarnell commented Jan 4, 2018

tbg commented Jan 5, 2018

bdarnell commented Jan 8, 2018 •

edited

Loading

bdarnell commented Feb 14, 2018

tbg commented Oct 15, 2018

tbg commented Mar 1, 2019

tbg commented Mar 1, 2019

tbg commented Mar 1, 2019

bdarnell commented Mar 1, 2019

tbg commented Mar 7, 2019

bdarnell commented Mar 11, 2019

tbg commented Aug 12, 2019

bdarnell commented Aug 12, 2019

tbg commented Aug 12, 2019

storage: *roachpb.RaftGroupDeletedError for RHS in splitPostApply #21146

storage: *roachpb.RaftGroupDeletedError for RHS in splitPostApply #21146

Comments

tbg commented Jan 1, 2018

tbg commented Jan 2, 2018

bdarnell commented Jan 4, 2018

tbg commented Jan 5, 2018

bdarnell commented Jan 8, 2018 • edited Loading

bdarnell commented Feb 14, 2018

tbg commented Oct 15, 2018

tbg commented Mar 1, 2019

tbg commented Mar 1, 2019

tbg commented Mar 1, 2019

bdarnell commented Mar 1, 2019

tbg commented Mar 7, 2019

bdarnell commented Mar 11, 2019

tbg commented Aug 12, 2019

bdarnell commented Aug 12, 2019

tbg commented Aug 12, 2019

bdarnell commented Jan 8, 2018 •

edited

Loading