-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schedule pendding online chunkserver #252
Conversation
recheck |
1 similar comment
recheck |
@@ -38,27 +38,84 @@ int CopySetScheduler::Schedule() { | |||
for (auto lid : topo_->GetLogicalpools()) { | |||
res = DoCopySetSchedule(lid); | |||
} | |||
|
|||
LOG(INFO) << "copysetScheduler end, generate operator " << res; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format of commit message and title should include module name:
chunkserver: schedule pendding online chunkserver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
TEST_F(CopysetSchedulerPOC, test_scatterwith_after_copysetRebalance_5) { //NOLINT | ||
// 测试两个online的chunkserver 标记为pendding,copyset迁移走 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
English
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if (distribute.empty()) { | ||
LOG(WARNING) << "no not-retired chunkserver in topology"; | ||
return UNINTIALIZE_ID; | ||
int CopySetScheduler::PenddingCopySetSchedule(const std::map<ChunkServerIdType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it more reasonable to schedule chunkserver in pendding status by recoverSchedule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RecorveSchedule handle the copyset which aready lost at least one copy. CopysetScheduler handle the copyset which has all copys.
auto chunkserverList = topo_->GetChunkServersInLogicalPool(lid); | ||
|
||
std::map<ChunkServerIdType, std::vector<CopySetInfo>> penddingDistribute; | ||
SchedulerHelper::CopySetDistributionInOnlinePenddingChunkServer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chunkserver in pendding status do not need sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has no necessary to sort in this situation. The purpose is to change peer from pendding chunkserver to online and no pendding chunkserver, the sequence of change peer operator is less important.
@@ -379,6 +379,69 @@ void SchedulerHelper::CopySetDistributionInOnlineChunkServer( | |||
} | |||
} | |||
} | |||
|
|||
void SchedulerHelper::CopySetDistributionInOnlinePenddingChunkServer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CopySetDistributionInOnlinePenddingChunkServer is similar with CopySetDistributionInOnlineNormalChunkServer, It is recommended to distinguish status by parameter. Otherwise, it is easy to forget to modify later
void SchedulerHelper::CopySetDistributionChunkServer(
const Status status,
const std::vector ©setList,
const std::vector &chunkserverList,
std::map<ChunkServerIdType, std::vector> *out) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if (csInfo.IsPendding()) { | ||
continue; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (csInfo.IsOffline() || csInfo.IsPendding()) {
continue;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// can not transfer to pendding chunkserver | ||
if (csInfo.IsPendding()) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (csInfo.IsOffline() || csInfo.IsPendding()) {
continue;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case between offline and pendding is different. If the chunk is offline, it break this while. If the chunk is pendding, it skip this round with continue.
src/mds/schedule/scheduler.cpp
Outdated
@@ -289,9 +290,16 @@ ChunkServerIdType Scheduler::SelectRedundantReplicaToRemove( | |||
<< cs << " from " << copySetInfo.CopySetInfoStr(); | |||
return cs; | |||
} | |||
|
|||
// chunkserver is not pendding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chunkserver is pendding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ApplyOperatorsInOpController(std::set<ChunkServerIdType>{removeOne}); | ||
} while (removeOne > 0); | ||
operatorCount = copySetScheduler_->Schedule(); | ||
ApplyOperatorsInOpController(csSet); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This node is not necessarily removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copySetScheduler_->Schedule() returns the operator count rather then the chunkserver want to be removed. So the ApplyOperatorsInOpController use all chunkserver list instead of the removeOne.
@@ -1217,6 +1229,61 @@ TEST_F(CopysetSchedulerPOC, test_scatterwith_after_copysetRebalance_3) { //NOLIN | |||
// 均值:100, 方差:1, 标准差: 1, 最大值: 101, 最小值:91 | |||
} | |||
|
|||
TEST_F(CopysetSchedulerPOC, test_scatterwith_after_copysetRebalance_4) { //NOLINT | |||
// 测试一个online的chunkserver 标记为pendding,copyset迁移走 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
english
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
082fd16
to
09a3763
Compare
@@ -379,6 +379,43 @@ void SchedulerHelper::CopySetDistributionInOnlineChunkServer( | |||
} | |||
} | |||
} | |||
|
|||
void SchedulerHelper::CopySetDistributionInOnlineChunkServer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function naming is inappropriate, use verbs to name it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
continue; | ||
} | ||
|
||
// find one copy set to migrate out from source chunkserver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copyset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
if (AddOperatorAndCreateCopyset(op, info, target)) { | ||
oneRoundGenOp++; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why break? One record at a time for a chunkserver? It Should be based on concurrency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
continue; | ||
} | ||
|
||
if (item.status != status) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Find the chunkserver with the same status, this judgment condition is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dd772ff
to
390055a
Compare
recheck |
recheck |
1 similar comment
recheck |
distributions->erase(item.info.id); | ||
} | ||
|
||
if (item.status == ChunkServerStatus::PENDDING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not need
Signed-off-by: Fabian Deutsch <fabiand@fedoraproject.org>
What is changed and how it works?
What's Changed:
schedule out copyset from online and pendding chunkserver
How it Works:
schedule out copyset from online and pendding chunkserver
Side effects(Breaking backward compatibility? Performance regression?): no
Check List