Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds/client: support discard #189

Merged
merged 1 commit into from
May 14, 2021
Merged

Conversation

wu-hanqing
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?

What's Changed:

How it Works:

Side effects(Breaking backward compatibility? Performance regression?):

Check List

  • Relevant documentation/comments is changed or added
  • I acknowledge that all my contributions will be made under the project's license

@wu-hanqing wu-hanqing force-pushed the discard-v2 branch 2 times, most recently from e67ade3 to 4617fd2 Compare December 20, 2020 12:50
@wu-hanqing wu-hanqing closed this Dec 20, 2020
@wu-hanqing wu-hanqing reopened this Dec 20, 2020
@wu-hanqing wu-hanqing force-pushed the discard-v2 branch 10 times, most recently from 2a0a318 to 8c0af56 Compare December 23, 2020 11:36
@wu-hanqing wu-hanqing marked this pull request as ready for review December 23, 2020 11:36
@wu-hanqing wu-hanqing closed this Dec 23, 2020
@wu-hanqing wu-hanqing reopened this Dec 23, 2020
@wu-hanqing wu-hanqing closed this Dec 23, 2020
@wu-hanqing wu-hanqing reopened this Dec 23, 2020
@wu-hanqing wu-hanqing closed this Dec 24, 2020
@wu-hanqing wu-hanqing reopened this Dec 24, 2020
@wu-hanqing wu-hanqing closed this Dec 24, 2020
@wu-hanqing wu-hanqing reopened this Dec 24, 2020
@wu-hanqing wu-hanqing closed this Dec 24, 2020
@wu-hanqing wu-hanqing reopened this Dec 24, 2020
@wu-hanqing wu-hanqing closed this Dec 24, 2020
@wu-hanqing wu-hanqing reopened this Dec 24, 2020
@wu-hanqing wu-hanqing closed this Dec 24, 2020
@wu-hanqing wu-hanqing reopened this Dec 24, 2020
@wu-hanqing wu-hanqing force-pushed the discard-v2 branch 3 times, most recently from 1322e8a to 3ead91d Compare January 6, 2021 11:27
aioctx->ret = -LIBCURVE_ERROR::FAILED;
aioctx->cb(aioctx);
LOG(ERROR) << "allocate tracker failed!";
return -LIBCURVE_ERROR::FAILED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方不需要返回错误,参考write的写法

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

}

ReadLockGuard lk(rwlock4Segments_);
return &segments_.at(segmentIndex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里好像跑不到?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some codes have changed, please re-review this.

@@ -26,8 +26,8 @@ namespace curve {
namespace mds {

// TODO(xuchaojie): these should be in the configuration file later
uint64_t DefaultSegmentSize = kGB * 1;
uint64_t kMiniFileLength = DefaultSegmentSize * 10;
uint64_t DefaultSegmentSize = 128 * kMB;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

segmentsize 也一并改小了?这个确定了吗?

Copy link
Contributor Author

@wu-hanqing wu-hanqing Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a mistake, fix

inline bool CurveFS::CheckSegmentOffset(const FileInfo& fileInfo,
uint64_t offset) const {
if (offset % fileInfo.segmentsize() != 0) {
LOG(WARNING) << "offset not align with segment, offset = " << offset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个日志可以打Error,因为一般不会触发,有助于发现问题

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

if (snapShotFiles.size() != 0) {
LOG(WARNING) << fileName
<< " exist snapshot, num = " << snapShotFiles.size();
return StatusCode::kFileUnderSnapShot;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有快照的情况下,DeAllocateSegment失败,后续怎么处理?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing changed including cached segment info and bitmap, and this task will be dropped.

return StatusCode::kParaError;
}

if (fileInfo.filestatus() == FileStatus::kFileBeingCloned) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个条件和下面这个存在快照不能删除的条件,跟CheckFileCanChange函数内容一致,是否可以直接调用CheckFileCanChange

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CheckFileCanChange also checks whether the file is mounted, but we don't need it here.

}

Operation op1{
OpType::OpDelete,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpDelete 为啥需要传value,没有只传key的接口?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilixiaocui Can you help to explain why we need value in here?

int CurveRequestExecutor::Discard(NebdFileInstance* fd,
NebdServerAioContext* aioctx) {
int curveFd = GetCurveFdFromNebdFileInstance(fd);
if (curveFd < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

打下error日志

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

curveCombineCtx->nebdCtx = aioctx;
int ret = FromNebdCtxToCurveCtx(aioctx, &curveCombineCtx->curveCtx);
if (ret < 0) {
delete curveCombineCtx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

打印error日志

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

}

delete curveCombineCtx;
return -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

打印error日志

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

// for discard
struct DiscardOption {
bool enableDiscard = false;
uint32_t discardTaskDelayMs = 1000 * 60 * 3; // 3 min
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个时间是否设置为可配置?

*
* @return StoreStatus: error code
*/
virtual StoreStatus DiscardSegment(const FileInfo& fileInfo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里增加metric统计一下需要discard的segment数量

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@@ -297,6 +322,8 @@ class CURVE_CACHELINE_ALIGNMENT IOTracker {
// 大IO被拆分成多个request,这些request放在reqlist中国保存
std::vector<RequestContext*> reqlist_;

std::vector<SegmentIndex> discardSegments_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

成员函数注释。discard相关的参数是否都封装到 DiscardOption 中

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@@ -109,6 +109,45 @@ int Splitor::IO2ChunkRequests(IOTracker* iotracker, MetaCache* metaCache,
return 0;
}

int Splitor::CalcDiscardSegments(IOTracker* iotracker) {
if (iotracker == nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

chunks_() {}

/**
* @brief Test if all bit was discarded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test?confirm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

* @return Return true if if all bits are set, otherwise return false
*/
bool IsAllDiscard() {
return discardBitmap_.NextClearBit(0) == curve::common::Bitmap::NO_POS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bitmap查询不需要加读锁吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, before calling this function, the caller has held FileSegment's write lock.

void SetDiscard(const uint64_t offset, const uint32_t length);
void ClearDiscard(const uint64_t offset, const uint32_t length);

void ClearBitmap() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearDiscard和ClearBitmap有啥区别,直观上觉得ClearBitmp是ClearDiscard的一个步骤

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed some functions here to better explain its self.
And for ClearDiscard and ClearBitmap, ClearBitmap is equal to ClearDiscard when offset == 0 and length == segmentsize;

<< ", segment index = " << index
<< ", segment offset = " << index * GiB;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else错误日志?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add log in here.

if (!taskManager->ScheduleTask(index, mc_, mdsClient, abstime)) {
LOG(ERROR) << "Schedule discard task failed";
ret = -1;
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果其中一个schedule出错,后面都是正常返回,ret还是设置为-1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this line, and always return success for discard operations. Because return error may cause filesystem error, and failed discard operations have no bad effect.

int IOTracker::Wait() {
return iocv_.Wait();
}

void IOTracker::Done() {
if (type_ == OpType::READ || type_ == OpType::WRITE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optype为Discard的时候不需要释放吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, read/write operations will hold the corresponding segment‘s read lock in the whole process. And, discard operations hold segment's write lock when set bitmap, and after that, the write-lock is released.


// IOTracker用于跟踪一个用户IO,因为一个用户IO可能会跨chunkserver,
// 因此在真正下发的时候会被拆分成多个小IO并发的向下发送,因此我们需要
// 跟踪发送的request的执行情况。
class CURVE_CACHELINE_ALIGNMENT IOTracker {
friend class Splitor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥不直接把CalcDiscardSegments作为IOTracker的成员函数?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please re-review this, CalcDiscardSegments has been deleted.
And for this friend class Splitor, I think functions in Splitor take some many arguments that belong to IOTracker, either we can absorb all Splitor's functions into IOTracker or make Splitor as a friend class of IOTracker.
Now, I just add friend class Splitor in here and leave Splitor's functions parameters unchanged. Later I will refactor Splitor.

@@ -111,6 +112,8 @@ void IOManager4File::UnInitialize() {
scheduler_->Fini();
}

discardTaskManager_.Stop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop并不一定能把任务都停掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point out which scenario can cause this problem? The Stop function first cancels all timed tasks according to its timerId, and in the last, it will wait until all tasks have been canceled or finished.

@@ -81,6 +81,14 @@ class MetaCache {
virtual MetaCacheErrorType GetChunkInfoByIndex(ChunkIndex chunkidx,
ChunkIDInfo_t* chunkinfo);

/**
* 通过chunk index更新chunkid信息
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

英文注释

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

*
* @param[in] startKey
* @param[in] endKey
* @param[out] out key and values between [startKey, endKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out中备注一下key value分别代表什么

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@@ -176,5 +166,80 @@ StatusCode CleanCore::CleanFile(const FileInfo & commonFile,
progress->SetStatus(TaskStatus::SUCCESS);
return StatusCode::kOK;
}

StatusCode CleanCore::CleanDiscardSegment(
const std::string& cleanSegmentKey,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加metric统计下当前待清理的segement空间

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

running_ = true;
taskThread_ = curve::common::Thread(
&CleanDiscardSegmentTask::ScanAndExecTask, this);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if 和 else 都分别打印下日志吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

sleeper_.interrupt();
taskThread_.join();
LOG(INFO) << "stop CleanDiscardSegmentTask success";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else也打下日志

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

timer.start();

// delete chunks
int ret = DeleteChunksInSegment(segment, seq);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

progress在这个过程中不更新吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to update because only progress status is useful for this task.

return StatusCode::kSegmentNotAllocated;
}

if (segment.startoffset() != offset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果offset不一致,storage_->GetSegment(fileInfo.id(), offset, &segment);会返回ok吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will return KeyNotExist, but this double check is harmless.

}

std::vector<FileInfo> snapShotFiles;
if (storage_->ListSnapshotFile(fileInfo.id(), fileInfo.id() + 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面已经判断了状态,这里为什么还需要listsnapshotfile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileStatus doesn't have snapshot info.

@wu-hanqing wu-hanqing force-pushed the discard-v2 branch 5 times, most recently from ed0f236 to 1be3b83 Compare April 26, 2021 11:10
@wu-hanqing wu-hanqing changed the title support discard mds/client: support discard Apr 26, 2021
@wu-hanqing
Copy link
Contributor Author

recheck

3 similar comments
@wu-hanqing
Copy link
Contributor Author

recheck

@wu-hanqing
Copy link
Contributor Author

recheck

@wu-hanqing
Copy link
Contributor Author

recheck

for (auto& kv : unfinishedTasks_) {
currentTasks.emplace(kv.first);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need make a copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if didn't make a copy here, I have to hold the lock before the for-loop, and release it after the for-loop, and it will block other tasks that going to modify unfinishedTasks_.

} else if (ret == EINVAL) {
LOG(WARNING)
<< "Task has been completed or some error occurs, taskid = "
<< timerId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if error occurs, Whether the related resources have been recycled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message is changed, according to bthread_timer_del's return code, EINVAL means id does not exist. So, two scenarios can cause this, first is task is finished and task id is removed, second is task id is invalid.

timespec abstime =
butil::milliseconds_from_now(discardOption_.taskDelayMs);
if (!taskManager->ScheduleTask(index, mc_, mdsClient, abstime)) {
LOG(ERROR) << "Schedule discard task failed";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print filename、offset here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@@ -66,6 +67,11 @@ class IOManager4File : public IOManager {
const IOOption& ioOpt,
MDSClient* mdsclient);

/**
* 析构,回收资源
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

english

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@ilixiaocui ilixiaocui merged commit 5a4889d into opencurve:master May 14, 2021
@wu-hanqing wu-hanqing deleted the discard-v2 branch May 17, 2021 02:45
ilixiaocui pushed a commit to ilixiaocui/curve that referenced this pull request Feb 6, 2023
* Project Idea

#### Developer Experience
##### Kruise Deployment Wizard

Proposed Project

* Update project to focus on Argo

Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>

Co-authored-by: Chris Aniszczyk <caniszczyk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants