Changefeed with sink-uri=kafka status become failed after all PD restart #2389

Tammyxia · 2021-07-27T11:37:38Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

What did you do? If possible, provide a recipe for reproducing the error.

2x capture, 2x changefeed:
Starting component cdc: /root/.tiup/components/cdc/v4.0.14/cdc cli changefeed list --pd=http://172.16.6.28:2379
[
{
"id": "kafka-task-11",
"summary": {
"state": "normal",
"tso": 426608082704400445,
"checkpoint": "2021-07-27 18:11:26.586",
"error": null
}
},
{
"id": "replication-task-11",
"summary": {
"state": "normal",
"tso": 426608707681910785,
"checkpoint": "2021-07-27 18:51:10.686",
"error": null
}
}
Restart PD: $tiup cluster restart 360UP -R pd

What did you expect to see?

No any error

What did you see instead?

Changefeed with sink-uri=kafka status become failed after all PD restart
Starting component cdc: /root/.tiup/components/cdc/v4.0.14/cdc cli changefeed list --pd=http://172.16.6.28:2379
[
{
"id": "replication-task-11",
"summary": {
"state": "normal",
"tso": 426608859290271791,
"checkpoint": "2021-07-27 19:00:49.026",
"error": null
}
},
{
"id": "kafka-task-11",
"summary": {
"state": "failed",
"tso": 426608874219372599,
"checkpoint": "2021-07-27 19:01:45.976",
"error": {
"addr": "172.16.6.32:8300",
"code": "CDC-owner-1001",
"message": "rpc error: code = Unknown desc = rpc error: code = Unavailable desc = not leader"
}
}
}
]
Check cf status again, the failed kafka-task-11 checkpoint is still updating...
Starting component cdc: /root/.tiup/components/cdc/v4.0.14/cdc cli changefeed list --pd=http://172.16.6.28:2379
[
{
"id": "kafka-task-11",
"summary": {
"state": "failed",
"tso": 426609068572934145,
"checkpoint": "2021-07-27 19:14:07.376",
"error": {
"addr": "172.16.6.32:8300",
"code": "CDC-owner-1001",
"message": "rpc error: code = Unknown desc = rpc error: code = Unavailable desc = not leader"
}
}
},
{
"id": "replication-task-11",
"summary": {
"state": "normal",
"tso": 426609068677791745,
"checkpoint": "2021-07-27 19:14:07.776",
"error": null
}
}

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):
```
4.0.14
```

TiCDC version (execute cdc version):

["Welcome to Change Data Capture (CDC)"] [release-version=v4.0.14] [git-hash=5a7851967f686da896b45acd3f3e968bfe53d6bd] [git-branch=heads/refs/tags/v4.0.14]

The text was updated successfully, but these errors were encountered:

Tammyxia · 2021-07-27T11:59:26Z

What's more, cli changefeed list don't work anymore...

$ tiup cdc:v4.0.14 cli changefeed list --pd=http://172.16.6.24:237
Starting component cdc: /root/.tiup/components/cdc/v4.0.14/cdc cli changefeed list --pd=http://172.16.6.24:237
Error: fail to open PD etcd client, pd="http://172.16.6.24:237": context deadline exceeded
Usage:
cdc cli changefeed list [flags]

Flags:
-a, --all List all replication tasks(including removed and finished)
-h, --help help for list

Global Flags:
--ca string CA certificate path for TLS connection
--cert string Certificate path for TLS connection
-i, --interact Run cdc cli with readline
--key string Private key path for TLS connection
--log-level string log level (etc: debug|info|warn|error) (default "warn")
--pd string PD address, use ',' to separate multiple PDs (default "http://127.0.0.1:2379")

fail to open PD etcd client, pd="http://172.16.6.24:237": context deadline exceeded

While $tiup cdc:v4.0.14 cli capture list --pd=http://172.16.6.24:2379 still works.

3AceShowHand · 2021-08-24T17:11:37Z

{
"id": "kafka-task-11",
"summary": {
"state": "failed",
"tso": 426608874219372599,
"checkpoint": "2021-07-27 19:01:45.976",
"error": {
"addr": "172.16.6.32:8300",
"code": "CDC-owner-1001",
"message": "rpc error: code = Unknown desc = rpc error: code = Unavailable desc = not leader"
}
}
}

3AceShowHand · 2021-08-24T17:11:55Z

checkpoint still updating after failed.

3AceShowHand · 2021-08-24T17:13:49Z

the error may happen when try to newChangefeed

3AceShowHand · 2021-08-24T17:23:26Z

#942 may still unsolved, because resultErr != nil always be false, so ddlHandler and primarySink will not be closed if make changefeed failed.

3AceShowHand · 2021-08-24T17:38:27Z

"message": "rpc error: code = Unknown desc = rpc error: code = Unavailable desc = not leader" this may happend because of call pdClient or etcdClient

3AceShowHand · 2021-08-24T17:52:11Z

The problem should have already fix by #2370.

replay the scenario on release-4.0 before try to fix it.

3AceShowHand · 2021-08-25T12:17:28Z

The problem is already fixed in release-4.0 branch.

I have manually test several times, test-2 is created and run in release-4.0, it works fine

Tammyxia added type/bug The issue is confirmed as a bug. severity/major labels Jul 27, 2021

asddongmen added bug-from-internal-test Bugs found by internal testing. component/sink Sink component. difficulty/medium Medium task. labels Jul 28, 2021

overvenus assigned 3AceShowHand Jul 29, 2021

3AceShowHand mentioned this issue Aug 25, 2021

owner: unify the naming of named return value in new changefeed. #2623

Closed

3AceShowHand closed this as completed Sep 8, 2021

AkiraXie added the area/ticdc Issues or PRs related to TiCDC. label Mar 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changefeed with sink-uri=kafka status become failed after all PD restart #2389

Changefeed with sink-uri=kafka status become failed after all PD restart #2389

Tammyxia commented Jul 27, 2021

Tammyxia commented Jul 27, 2021 •

edited

Loading

3AceShowHand commented Aug 24, 2021

3AceShowHand commented Aug 24, 2021

3AceShowHand commented Aug 24, 2021

3AceShowHand commented Aug 24, 2021

3AceShowHand commented Aug 24, 2021

3AceShowHand commented Aug 24, 2021

3AceShowHand commented Aug 25, 2021 •

edited

Loading

Changefeed with sink-uri=kafka status become failed after all PD restart #2389

Changefeed with sink-uri=kafka status become failed after all PD restart #2389

Comments

Tammyxia commented Jul 27, 2021

Bug Report

Tammyxia commented Jul 27, 2021 • edited Loading

3AceShowHand commented Aug 24, 2021

3AceShowHand commented Aug 24, 2021

3AceShowHand commented Aug 24, 2021

3AceShowHand commented Aug 24, 2021

3AceShowHand commented Aug 24, 2021

3AceShowHand commented Aug 24, 2021

3AceShowHand commented Aug 25, 2021 • edited Loading

Tammyxia commented Jul 27, 2021 •

edited

Loading

3AceShowHand commented Aug 25, 2021 •

edited

Loading