The CDC client is still using the old PD address #9584

jacktd9 · 2023-08-15T15:34:16Z

What did you do?

Initially, there were 3 old PD nodes (pd1, pd2, pd3).
A scaling operation was performed, adding 3 new PD nodes (pd4, pd5, pd6).
After waiting for 5 minutes.
A downsizing operation was executed, removing 3 old PD nodes (pd1, pd2, pd3).
The CDC changefeed operation was restored.

An error occurred during the command execution, and it was found that the CDC logs are still accessing the old PD

current PD address is... 2479

What did you expect to see?

no error

What did you see instead?

connect pd failed

Versions of the cluster

cluster version : v6.5.3

nongfushanquan · 2023-08-17T06:28:44Z

/assign @asddongmen

asddongmen · 2023-08-17T06:36:53Z

@jacktd9 May I ask if the changefeed has resumed normal synchronization? In other words, was the error log you found temporary or has it not been resolved yet?

jacktd9 · 2023-08-17T13:27:30Z

When we reload CDC and then execute the same 'resume' command again, the command was successful.

Similarly, in the scenario where '1' has already reloaded CDC, we attempted to update the changefeed configuration, but it returned a 500 error. We discovered that this seemed to be due to the effect of the old PD still stored in the upstream info. We tried expanding one of the previously downsized PD nodes and then executed the same 'update' command again, which succeeded this time.

asddongmen · 2023-08-18T04:07:02Z

When we reload CDC and then execute the same 'resume' command again, the command was successful.

Similarly, in the scenario where '1' has already reloaded CDC, we attempted to update the changefeed configuration, but it returned a 500 error. We discovered that this seemed to be due to the effect of the old PD still stored in the upstream info. We tried expanding one of the previously downsized PD nodes and then executed the same 'update' command again, which succeeded this time.

So, if I understand correctly, TiCDC's pdClient is still using the old address and cannot update to the new one?

Based on our discussion and my comprehension, here is a summary:

There are warning logs indicating that pdClient cannot connect to the old PD address, but changefeed can still advance.
After restarting the cdc process, you can successfully execute cdc cli changefeed resume, but are unable to execute cdc cli changefeed update.
When you scale out the PD cluster with one of the old PD nodes, TiCDC can correctly handle update changefeed requests.

Please correct me if I misunderstood anything. @jacktd9

close #9584

…#9765) close #9584

ref #9584

…#9764) close #9584

asddongmen · 2024-12-13T05:58:06Z

Fixed in v6.5.6.

kennytm · 2025-01-09T07:40:53Z

We have found that it is still not able to create changefeeds after scale-out -> scale-in operation. Reproduction:

tiup playground v7.1.5 --db 1 --kv 1 --pd 3 --ticdc 3 --tiflash 0 --without-monitor
# sanity check, ensure that tidb & cdc works
mysql -u root -h 127.1 -P 4000 -e 'select * from mysql.tidb'
tiup cdc:v7.1.5 cli --server 127.0.0.1:8300 changefeed create --sink-uri 'blackhole://' -c test0
tiup cdc:v7.1.5 cli --server 127.0.0.1:8300 changefeed remove -c test0
# perform scale-out (do not scale out all 3 simultaneously!)
tiup playground scale-out --pd 1
tiup playground scale-out --pd 1
tiup playground scale-out --pd 1
# note the PIDs of the first 3 PDs 
tiup playground display 
# perform scale-in
tiup playground scale-in --pid 23397,23398,23399
# sanity check, tidb should still work
mysql -u root -h 127.1 -P 4000 -e 'select * from mysql.tidb'
mysql -u root -h 127.1 -P 4000 -e 'show config where type = "pd" and name = "cluster-version"'
# try to create changefeed again
tiup cdc:v7.1.5 cli --server 127.0.0.1:8300 changefeed create --sink-uri 'blackhole://' -c test1
# ^ the program above is now stuck.
#   cdc log shows a lot of warnings like:
#
#   [2025/01/09 15:24:37.834 +08:00] [WARN] [pd_service_discovery.go:370] ["[pd] failed to get cluster id"] [url=http://127.0.0.1:2382] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2382: connect: connection refused\" target:127.0.0.1:2382 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2382: connect: connection refused\" target:127.0.0.1:2382 status:TRANSIENT_FAILURE"]
#
#   doing Ctrl+C may also give us
#
#   Error: [CDC:ErrNewStore]new store failed: [pd] failed to get cluster id

This is reproduced on v7.1.4, v7.1.6, v8.5.0. I'm going to reopen the issue.

lidezhu · 2025-01-10T10:25:19Z

It seems cdc get pd address by common line arguments.

So this behavior should meet the current expectation.

kennytm · 2025-01-10T13:50:13Z

@lidezhu No the current behavior can be explained saying the CDC owner handles the API by using the --pd it was initialized with, but it is not the behavior expected by the customers.

lidezhu · 2025-01-13T09:56:49Z

It should be a problem of pd: tikv/pd#8993

lidezhu · 2025-01-15T08:33:14Z

Create a new issue #12004 for the convenience of later cherry pick's management. Close this one.

jacktd9 added area/ticdc Issues or PRs related to TiCDC. type/bug The issue is confirmed as a bug. labels Aug 15, 2023

ti-chi-bot bot assigned asddongmen Aug 17, 2023

asddongmen added severity/minor affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. labels Aug 31, 2023

asddongmen mentioned this issue Sep 8, 2023

openAPI (ticdc): fix openapi use old addrs to create pdClient #9713

Merged

ti-chi-bot bot closed this as completed in #9713 Sep 18, 2023

ti-chi-bot bot pushed a commit that referenced this issue Sep 18, 2023

openAPI (ticdc): fix openapi use old addrs to create pdClient (#9713)

8e346aa

close #9584

This was referenced Sep 18, 2023

openAPI (ticdc): fix openapi use old addrs to create pdClient (#9713) #9764

Merged

openAPI (ticdc): fix openapi use old addrs to create pdClient (#9713) #9765

Merged

ti-chi-bot bot pushed a commit that referenced this issue Sep 19, 2023

openAPI (ticdc): fix openapi use old addrs to create pdClient (#9713) (…

8416cad

…#9765) close #9584

nongfushanquan mentioned this issue Oct 18, 2023

v7.1.2: add release notes pingcap/docs-cn#15256

Merged

18 tasks

asddongmen mentioned this issue Nov 15, 2023

use upstream pd client in api v2 #10099

Merged

ti-chi-bot bot pushed a commit that referenced this issue Nov 16, 2023

use upstream pd client in api v2 (#10099)

255d810

ref #9584

ti-chi-bot bot pushed a commit that referenced this issue Nov 20, 2023

openAPI (ticdc): fix openapi use old addrs to create pdClient (#9713) (…

6a31202

…#9764) close #9584

github-project-automation bot added this to Question and Bug Reports Aug 28, 2024

github-project-automation bot moved this to Done in Question and Bug Reports Aug 28, 2024

kennytm reopened this Jan 9, 2025

github-project-automation bot moved this from Done to In Progress in Question and Bug Reports Jan 9, 2025

lidezhu mentioned this issue Jan 15, 2025

api(cdc): fix create changefeed after scale-in pd #12003

Merged

lidezhu mentioned this issue Jan 15, 2025

Failed to create changefeed after scale-in pd #12004

Closed

lidezhu closed this as completed Jan 15, 2025

github-project-automation bot moved this from In Progress to Done in Question and Bug Reports Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The CDC client is still using the old PD address #9584

The CDC client is still using the old PD address #9584

jacktd9 commented Aug 15, 2023

nongfushanquan commented Aug 17, 2023

asddongmen commented Aug 17, 2023

jacktd9 commented Aug 17, 2023

asddongmen commented Aug 18, 2023 •

edited

Loading

asddongmen commented Dec 13, 2024

kennytm commented Jan 9, 2025

lidezhu commented Jan 10, 2025

kennytm commented Jan 10, 2025

lidezhu commented Jan 13, 2025

lidezhu commented Jan 15, 2025 •

edited

Loading

The CDC client is still using the old PD address #9584

The CDC client is still using the old PD address #9584

Comments

jacktd9 commented Aug 15, 2023

What did you do?

What did you expect to see?

What did you see instead?

Versions of the cluster

nongfushanquan commented Aug 17, 2023

asddongmen commented Aug 17, 2023

jacktd9 commented Aug 17, 2023

asddongmen commented Aug 18, 2023 • edited Loading

asddongmen commented Dec 13, 2024

kennytm commented Jan 9, 2025

lidezhu commented Jan 10, 2025

kennytm commented Jan 10, 2025

lidezhu commented Jan 13, 2025

lidezhu commented Jan 15, 2025 • edited Loading

asddongmen commented Aug 18, 2023 •

edited

Loading

lidezhu commented Jan 15, 2025 •

edited

Loading