Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loss when upstream transcation conflicts during incremental scan #5468

Closed
overvenus opened this issue May 18, 2022 · 1 comment · Fixed by #5477
Closed

Data loss when upstream transcation conflicts during incremental scan #5468

overvenus opened this issue May 18, 2022 · 1 comment · Fixed by #5477
Assignees
Labels
affects-4.0 affects-5.0 affects-5.1 affects-5.2 affects-5.3 affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. component/kv-client TiKV kv log client component. severity/critical type/bug The issue is confirmed as a bug.

Comments

@overvenus
Copy link
Member

What did you do?

For UPDATE SQL, its prewrite event has both value and old value.
It is possible that TiDB prewrites multiple times for the same row when
there are other transcations it conflicts with. For this case,
if the value is not "short", only the first prewrite contains the value.

TiKV may output events for the UPDATE SQL as following:

 TiDB: [Prwrite1]    [Prewrite2]      [Commit]
       v             v                v                                   Time
 ---------------------------------------------------------------------------->
         ^            ^    ^           ^     ^       ^     ^          ^     ^
 TiKV:   [Scan Start] [Send Prewrite2] [Send Commit] [Send Prewrite1] [Send Init]
 TiCDC:                    [Recv Prewrite2]  [Recv Commit] [Recv Prewrite1] [Recv Init]

TiCDC mistakely outputs an event that contains the old value but not contains the value.

The event is translated into DELETE in sink, so the row is lost.

See line L718-L743

case cdcpb.Event_COMMIT:
w.metrics.metricPullEventCommitCounter.Inc()
if entry.CommitTs <= state.lastResolvedTs {
logPanic("The CommitTs must be greater than the resolvedTs",
zap.String("EventType", "COMMIT"),
zap.Uint64("CommitTs", entry.CommitTs),
zap.Uint64("resolvedTs", state.lastResolvedTs),
zap.Uint64("regionID", regionID))
return errUnreachable
}
ok := state.matcher.matchRow(entry)
if !ok {
if !state.initialized {
state.matcher.cacheCommitRow(entry)
continue
}
return cerror.ErrPrewriteNotMatch.GenWithStackByArgs(
hex.EncodeToString(entry.GetKey()),
entry.GetStartTs(), entry.GetCommitTs(),
entry.GetType(), entry.GetOpType())
}
revent, err := assembleRowEvent(regionID, entry)
if err != nil {
return errors.Trace(err)
}

What did you expect to see?

No data loss

What did you see instead?

Data is lost.

Versions of the cluster

TiCDC version (execute cdc version):

v5.0.1
@overvenus overvenus added type/bug The issue is confirmed as a bug. component/kv-client TiKV kv log client component. severity/critical area/ticdc Issues or PRs related to TiCDC. affects-5.3 affects-5.2 affects-5.1 affects-5.0 affects-4.0 affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. labels May 18, 2022
@nongfushanquan
Copy link
Contributor

/assign overvenus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.0 affects-5.0 affects-5.1 affects-5.2 affects-5.3 affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. component/kv-client TiKV kv log client component. severity/critical type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants