Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedule pendding online chunkserver #252

Merged
merged 1 commit into from
Jun 10, 2021

Conversation

cw123
Copy link
Contributor

@cw123 cw123 commented Feb 3, 2021

What is changed and how it works?

What's Changed:
schedule out copyset from online and pendding chunkserver

How it Works:
schedule out copyset from online and pendding chunkserver

Side effects(Breaking backward compatibility? Performance regression?): no

Check List

  • Relevant documentation/comments is changed or added
  • I acknowledge that all my contributions will be made under the project's license

@cw123
Copy link
Contributor Author

cw123 commented Feb 4, 2021

recheck

1 similar comment
@cw123
Copy link
Contributor Author

cw123 commented Feb 4, 2021

recheck

@@ -38,27 +38,84 @@ int CopySetScheduler::Schedule() {
for (auto lid : topo_->GetLogicalpools()) {
res = DoCopySetSchedule(lid);
}

LOG(INFO) << "copysetScheduler end, generate operator " << res;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format of commit message and title should include module name:
chunkserver: schedule pendding online chunkserver

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

TEST_F(CopysetSchedulerPOC, test_scatterwith_after_copysetRebalance_5) { //NOLINT
// 测试两个online的chunkserver 标记为pendding,copyset迁移走
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (distribute.empty()) {
LOG(WARNING) << "no not-retired chunkserver in topology";
return UNINTIALIZE_ID;
int CopySetScheduler::PenddingCopySetSchedule(const std::map<ChunkServerIdType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more reasonable to schedule chunkserver in pendding status by recoverSchedule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RecorveSchedule handle the copyset which aready lost at least one copy. CopysetScheduler handle the copyset which has all copys.

auto chunkserverList = topo_->GetChunkServersInLogicalPool(lid);

std::map<ChunkServerIdType, std::vector<CopySetInfo>> penddingDistribute;
SchedulerHelper::CopySetDistributionInOnlinePenddingChunkServer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunkserver in pendding status do not need sort?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has no necessary to sort in this situation. The purpose is to change peer from pendding chunkserver to online and no pendding chunkserver, the sequence of change peer operator is less important.

@@ -379,6 +379,69 @@ void SchedulerHelper::CopySetDistributionInOnlineChunkServer(
}
}
}

void SchedulerHelper::CopySetDistributionInOnlinePenddingChunkServer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopySetDistributionInOnlinePenddingChunkServer is similar with CopySetDistributionInOnlineNormalChunkServer, It is recommended to distinguish status by parameter. Otherwise, it is easy to forget to modify later

void SchedulerHelper::CopySetDistributionChunkServer(
const Status status,
const std::vector &copysetList,
const std::vector &chunkserverList,
std::map<ChunkServerIdType, std::vector> *out) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (csInfo.IsPendding()) {
continue;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (csInfo.IsOffline() || csInfo.IsPendding()) {
continue;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// can not transfer to pendding chunkserver
if (csInfo.IsPendding()) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (csInfo.IsOffline() || csInfo.IsPendding()) {
continue;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case between offline and pendding is different. If the chunk is offline, it break this while. If the chunk is pendding, it skip this round with continue.

@@ -289,9 +290,16 @@ ChunkServerIdType Scheduler::SelectRedundantReplicaToRemove(
<< cs << " from " << copySetInfo.CopySetInfoStr();
return cs;
}

// chunkserver is not pendding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunkserver is pendding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ApplyOperatorsInOpController(std::set<ChunkServerIdType>{removeOne});
} while (removeOne > 0);
operatorCount = copySetScheduler_->Schedule();
ApplyOperatorsInOpController(csSet);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This node is not necessarily removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copySetScheduler_->Schedule() returns the operator count rather then the chunkserver want to be removed. So the ApplyOperatorsInOpController use all chunkserver list instead of the removeOne.

@@ -1217,6 +1229,61 @@ TEST_F(CopysetSchedulerPOC, test_scatterwith_after_copysetRebalance_3) { //NOLIN
// 均值:100, 方差:1, 标准差: 1, 最大值: 101, 最小值:91
}

TEST_F(CopysetSchedulerPOC, test_scatterwith_after_copysetRebalance_4) { //NOLINT
// 测试一个online的chunkserver 标记为pendding,copyset迁移走
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

english

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cw123 cw123 force-pushed the schedule_out_date branch 2 times, most recently from 082fd16 to 09a3763 Compare March 16, 2021 07:49
@@ -379,6 +379,43 @@ void SchedulerHelper::CopySetDistributionInOnlineChunkServer(
}
}
}

void SchedulerHelper::CopySetDistributionInOnlineChunkServer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function naming is inappropriate, use verbs to name it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

continue;
}

// find one copy set to migrate out from source chunkserver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if (AddOperatorAndCreateCopyset(op, info, target)) {
oneRoundGenOp++;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why break? One record at a time for a chunkserver? It Should be based on concurrency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

continue;
}

if (item.status != status) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find the chunkserver with the same status, this judgment condition is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cw123 cw123 force-pushed the schedule_out_date branch 2 times, most recently from dd772ff to 390055a Compare June 3, 2021 07:36
@cw123
Copy link
Contributor Author

cw123 commented Jun 7, 2021

recheck

@cw123 cw123 force-pushed the schedule_out_date branch from 390055a to bedb694 Compare June 8, 2021 02:34
@cw123
Copy link
Contributor Author

cw123 commented Jun 8, 2021

recheck

1 similar comment
@cw123
Copy link
Contributor Author

cw123 commented Jun 8, 2021

recheck

@cw123 cw123 force-pushed the schedule_out_date branch from bedb694 to a9270a8 Compare June 8, 2021 09:48
distributions->erase(item.info.id);
}

if (item.status == ChunkServerStatus::PENDDING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not need

@ilixiaocui ilixiaocui merged commit 563c137 into opencurve:master Jun 10, 2021
@cw123 cw123 deleted the schedule_out_date branch June 10, 2021 02:36
ilixiaocui pushed a commit to ilixiaocui/curve that referenced this pull request Feb 6, 2023
Signed-off-by: Fabian Deutsch <fabiand@fedoraproject.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants