Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copyset is not automatically migrated after stopping metaserver #1382

Closed
YunhuiChen opened this issue Apr 29, 2022 · 9 comments
Closed

Copyset is not automatically migrated after stopping metaserver #1382

YunhuiChen opened this issue Apr 29, 2022 · 9 comments
Assignees
Labels
bug Something isn't working high high priority need test Completion of development, requires QA verification

Comments

@YunhuiChen
Copy link
Contributor

Describe the bug (描述bug)

Copyset is not automatically migrated after stopping metaserver
To Reproduce (复现方法)
1、stop metaserver
2、wait 4 hours
3、
image

Expected behavior (期望行为)

Versions (各种版本)
OS:
Compiler:
branch:
commit id:

Additional context/screenshots (更多上下文/截图)

@YunhuiChen YunhuiChen added the bug Something isn't working label Apr 29, 2022
@YunhuiChen YunhuiChen added this to the Curve-2.2.0-beta milestone Apr 29, 2022
@cw123
Copy link
Contributor

cw123 commented Apr 29, 2022

I 2022-04-29T12:02:54.543547+0800 80 recoverScheduler.cpp:34] recoverScheduler begin.
I 2022-04-29T12:02:54.543730+0800 80 operatorControllerTemplate.h:141] add operator [startEpoch: 3, copysetID: (1,5), priority: 2, step: change peer from 5 to 6] fail because of oncurrency exceed
W 2022-04-29T12:02:54.543761+0800 80 recoverScheduler.cpp:95] recover scheduler add operator [startEpoch: 3, copysetID: (1,5), priority: 2, step: change peer from 5 to 6] on [copysetId:(1,5), epoch:3, leader:2, peers:(1,2,5,), canidate:0, has configChangeInfo:0] fail
I 2022-04-29T12:02:54.543777+0800 80 operatorControllerTemplate.h:141] add operator [startEpoch: 3, copysetID: (1,6), priority: 2, step: change peer from 5 to 6] fail because of oncurrency exceed
W 2022-04-29T12:02:54.543783+0800 80 recoverScheduler.cpp:95] recover scheduler add operator [startEpoch: 3, copysetID: (1,6), priority: 2, step: change peer from 5 to 6] on [copysetId:(1,6), epoch:3, leader:4, peers:(2,4,5,), canidate:0, has configChangeInfo:0] fail

@cw123
Copy link
Contributor

cw123 commented Apr 29, 2022

the recover generate operator, but the operator exceed the concurrency
the concurrent is set 1 at mds.conf.
mds.schduler.operator.concurrent=1

@cw123
Copy link
Contributor

cw123 commented Apr 29, 2022

image

@cw123
Copy link
Contributor

cw123 commented May 5, 2022

过了6天,copyset配置变更还是卡在这里
image

@cw123
Copy link
Contributor

cw123 commented May 5, 2022

mds日志里面,心跳有报这个错误
W 2022-05-05T17:43:33.544570+0800 106 heartbeat_manager.cpp:233] leader not found, poolid: 1, copysetid: 1
W 2022-05-05T17:43:37.469429+0800 100 heartbeat_manager.cpp:233] leader not found, poolid: 1, copysetid: 4
W 2022-05-05T17:43:42.525686+0800 90 heartbeat_manager.cpp:233] leader not found, poolid: 1, copysetid: 2
W 2022-05-05T17:43:43.544281+0800 98 heartbeat_manager.cpp:233] leader not found, poolid: 1, copysetid: 1
W 2022-05-05T17:43:47.469271+0800 94 heartbeat_manager.cpp:233] leader not found, poolid: 1, copysetid: 4
W 2022-05-05T17:43:52.526186+0800 25 heartbeat_manager.cpp:233] leader not found, poolid: 1, copysetid: 2
W 2022-05-05T17:43:53.545147+0800 107 heartbeat_manager.cpp:233] leader not found, poolid: 1, copysetid: 1

@cw123
Copy link
Contributor

cw123 commented May 5, 2022

metaserver有很多这个日志
image

@wuhongsong wuhongsong added the high high priority label May 9, 2022
@wu-hanqing
Copy link
Contributor

this problem is introduced by #1211, and the reason is when dispatching configuration change to metaserver, MDS will ensure that the target copyset is not creating state.

but #1211 doesn't reset creating state, so the configuration change doesn't dispatch.

@cw123
Copy link
Contributor

cw123 commented May 10, 2022

this problem is introduced by #1211, and the reason is when dispatching configuration change to metaserver, MDS will ensure that the target copyset is not creating state.

but #1211 doesn't reset creating state, so the configuration change doesn't dispatch.

will fix it in #1417

@cw123
Copy link
Contributor

cw123 commented May 12, 2022

fix it, merge to master #1417

@cw123 cw123 added the need test Completion of development, requires QA verification label May 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high high priority need test Completion of development, requires QA verification
Projects
None yet
Development

No branches or pull requests

5 participants