-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mds/client: support discard #189
Conversation
e67ade3
to
4617fd2
Compare
2a0a318
to
8c0af56
Compare
8c0af56
to
7b0799f
Compare
1322e8a
to
3ead91d
Compare
src/client/iomanager4file.cpp
Outdated
aioctx->ret = -LIBCURVE_ERROR::FAILED; | ||
aioctx->cb(aioctx); | ||
LOG(ERROR) << "allocate tracker failed!"; | ||
return -LIBCURVE_ERROR::FAILED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方不需要返回错误,参考write的写法
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
src/client/metacache.cpp
Outdated
} | ||
|
||
ReadLockGuard lk(rwlock4Segments_); | ||
return &segments_.at(segmentIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里好像跑不到?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some codes have changed, please re-review this.
src/mds/common/mds_define.cpp
Outdated
@@ -26,8 +26,8 @@ namespace curve { | |||
namespace mds { | |||
|
|||
// TODO(xuchaojie): these should be in the configuration file later | |||
uint64_t DefaultSegmentSize = kGB * 1; | |||
uint64_t kMiniFileLength = DefaultSegmentSize * 10; | |||
uint64_t DefaultSegmentSize = 128 * kMB; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
segmentsize 也一并改小了?这个确定了吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a mistake, fix
src/mds/nameserver2/curvefs.cpp
Outdated
inline bool CurveFS::CheckSegmentOffset(const FileInfo& fileInfo, | ||
uint64_t offset) const { | ||
if (offset % fileInfo.segmentsize() != 0) { | ||
LOG(WARNING) << "offset not align with segment, offset = " << offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个日志可以打Error,因为一般不会触发,有助于发现问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
if (snapShotFiles.size() != 0) { | ||
LOG(WARNING) << fileName | ||
<< " exist snapshot, num = " << snapShotFiles.size(); | ||
return StatusCode::kFileUnderSnapShot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有快照的情况下,DeAllocateSegment失败,后续怎么处理?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing changed including cached segment info and bitmap, and this task will be dropped.
return StatusCode::kParaError; | ||
} | ||
|
||
if (fileInfo.filestatus() == FileStatus::kFileBeingCloned) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个条件和下面这个存在快照不能删除的条件,跟CheckFileCanChange函数内容一致,是否可以直接调用CheckFileCanChange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CheckFileCanChange also checks whether the file is mounted, but we don't need it here.
} | ||
|
||
Operation op1{ | ||
OpType::OpDelete, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpDelete 为啥需要传value,没有只传key的接口?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilixiaocui Can you help to explain why we need value in here?
int CurveRequestExecutor::Discard(NebdFileInstance* fd, | ||
NebdServerAioContext* aioctx) { | ||
int curveFd = GetCurveFdFromNebdFileInstance(fd); | ||
if (curveFd < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
打下error日志
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
curveCombineCtx->nebdCtx = aioctx; | ||
int ret = FromNebdCtxToCurveCtx(aioctx, &curveCombineCtx->curveCtx); | ||
if (ret < 0) { | ||
delete curveCombineCtx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
打印error日志
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
} | ||
|
||
delete curveCombineCtx; | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
打印error日志
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
src/client/config_info.h
Outdated
// for discard | ||
struct DiscardOption { | ||
bool enableDiscard = false; | ||
uint32_t discardTaskDelayMs = 1000 * 60 * 3; // 3 min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个时间是否设置为可配置?
* | ||
* @return StoreStatus: error code | ||
*/ | ||
virtual StoreStatus DiscardSegment(const FileInfo& fileInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里增加metric统计一下需要discard的segment数量
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
@@ -297,6 +322,8 @@ class CURVE_CACHELINE_ALIGNMENT IOTracker { | |||
// 大IO被拆分成多个request,这些request放在reqlist中国保存 | |||
std::vector<RequestContext*> reqlist_; | |||
|
|||
std::vector<SegmentIndex> discardSegments_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
成员函数注释。discard相关的参数是否都封装到 DiscardOption 中
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
src/client/splitor.cpp
Outdated
@@ -109,6 +109,45 @@ int Splitor::IO2ChunkRequests(IOTracker* iotracker, MetaCache* metaCache, | |||
return 0; | |||
} | |||
|
|||
int Splitor::CalcDiscardSegments(IOTracker* iotracker) { | |||
if (iotracker == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
src/client/metacache_struct.h
Outdated
chunks_() {} | ||
|
||
/** | ||
* @brief Test if all bit was discarded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test?confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
* @return Return true if if all bits are set, otherwise return false | ||
*/ | ||
bool IsAllDiscard() { | ||
return discardBitmap_.NextClearBit(0) == curve::common::Bitmap::NO_POS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bitmap查询不需要加读锁吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, before calling this function, the caller has held FileSegment
's write lock.
void SetDiscard(const uint64_t offset, const uint32_t length); | ||
void ClearDiscard(const uint64_t offset, const uint32_t length); | ||
|
||
void ClearBitmap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clearDiscard和ClearBitmap有啥区别,直观上觉得ClearBitmp是ClearDiscard的一个步骤
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed some functions here to better explain its self.
And for ClearDiscard
and ClearBitmap
, ClearBitmap
is equal to ClearDiscard
when offset == 0 and length == segmentsize;
<< ", segment index = " << index | ||
<< ", segment offset = " << index * GiB; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else错误日志?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to add log in here.
src/client/io_tracker.cpp
Outdated
if (!taskManager->ScheduleTask(index, mc_, mdsClient, abstime)) { | ||
LOG(ERROR) << "Schedule discard task failed"; | ||
ret = -1; | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果其中一个schedule出错,后面都是正常返回,ret还是设置为-1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this line, and always return success for discard operations. Because return error may cause filesystem error, and failed discard operations have no bad effect.
int IOTracker::Wait() { | ||
return iocv_.Wait(); | ||
} | ||
|
||
void IOTracker::Done() { | ||
if (type_ == OpType::READ || type_ == OpType::WRITE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optype为Discard的时候不需要释放吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, read/write operations will hold the corresponding segment‘s read lock in the whole process. And, discard operations hold segment's write lock when set bitmap, and after that, the write-lock is released.
|
||
// IOTracker用于跟踪一个用户IO,因为一个用户IO可能会跨chunkserver, | ||
// 因此在真正下发的时候会被拆分成多个小IO并发的向下发送,因此我们需要 | ||
// 跟踪发送的request的执行情况。 | ||
class CURVE_CACHELINE_ALIGNMENT IOTracker { | ||
friend class Splitor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为啥不直接把CalcDiscardSegments作为IOTracker的成员函数?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please re-review this, CalcDiscardSegments
has been deleted.
And for this friend class Splitor
, I think functions in Splitor
take some many arguments that belong to IOTracker
, either we can absorb all Splitor's functions into IOTracker
or make Splitor
as a friend class of IOTracker
.
Now, I just add friend class Splitor
in here and leave Splitor
's functions parameters unchanged. Later I will refactor Splitor
.
src/client/iomanager4file.cpp
Outdated
@@ -111,6 +112,8 @@ void IOManager4File::UnInitialize() { | |||
scheduler_->Fini(); | |||
} | |||
|
|||
discardTaskManager_.Stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stop并不一定能把任务都停掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point out which scenario can cause this problem? The Stop
function first cancels all timed tasks according to its timerId, and in the last, it will wait until all tasks have been canceled or finished.
src/client/metacache.h
Outdated
@@ -81,6 +81,14 @@ class MetaCache { | |||
virtual MetaCacheErrorType GetChunkInfoByIndex(ChunkIndex chunkidx, | |||
ChunkIDInfo_t* chunkinfo); | |||
|
|||
/** | |||
* 通过chunk index更新chunkid信息 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
英文注释
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
src/kvstorageclient/etcd_client.h
Outdated
* | ||
* @param[in] startKey | ||
* @param[in] endKey | ||
* @param[out] out key and values between [startKey, endKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out中备注一下key value分别代表什么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
@@ -176,5 +166,80 @@ StatusCode CleanCore::CleanFile(const FileInfo & commonFile, | |||
progress->SetStatus(TaskStatus::SUCCESS); | |||
return StatusCode::kOK; | |||
} | |||
|
|||
StatusCode CleanCore::CleanDiscardSegment( | |||
const std::string& cleanSegmentKey, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加metric统计下当前待清理的segement空间
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
running_ = true; | ||
taskThread_ = curve::common::Thread( | ||
&CleanDiscardSegmentTask::ScanAndExecTask, this); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if 和 else 都分别打印下日志吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
sleeper_.interrupt(); | ||
taskThread_.join(); | ||
LOG(INFO) << "stop CleanDiscardSegmentTask success"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else也打下日志
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
timer.start(); | ||
|
||
// delete chunks | ||
int ret = DeleteChunksInSegment(segment, seq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
progress在这个过程中不更新吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to update because only progress status is useful for this task.
return StatusCode::kSegmentNotAllocated; | ||
} | ||
|
||
if (segment.startoffset() != offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果offset不一致,storage_->GetSegment(fileInfo.id(), offset, &segment);会返回ok吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will return KeyNotExist
, but this double check is harmless.
} | ||
|
||
std::vector<FileInfo> snapShotFiles; | ||
if (storage_->ListSnapshotFile(fileInfo.id(), fileInfo.id() + 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上面已经判断了状态,这里为什么还需要listsnapshotfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FileStatus doesn't have snapshot info.
ed0f236
to
1be3b83
Compare
recheck |
3 similar comments
recheck |
recheck |
recheck |
for (auto& kv : unfinishedTasks_) { | ||
currentTasks.emplace(kv.first); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need make a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if didn't make a copy here, I have to hold the lock before the for-loop
, and release it after the for-loop
, and it will block other tasks that going to modify unfinishedTasks_
.
} else if (ret == EINVAL) { | ||
LOG(WARNING) | ||
<< "Task has been completed or some error occurs, taskid = " | ||
<< timerId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if error occurs, Whether the related resources have been recycled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log message is changed, according to bthread_timer_del
's return code, EINVAL
means id does not exist. So, two scenarios can cause this, first is task is finished and task id is removed, second is task id is invalid.
src/client/io_tracker.cpp
Outdated
timespec abstime = | ||
butil::milliseconds_from_now(discardOption_.taskDelayMs); | ||
if (!taskManager->ScheduleTask(index, mc_, mdsClient, abstime)) { | ||
LOG(ERROR) << "Schedule discard task failed"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print filename、offset here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
src/client/iomanager4file.h
Outdated
@@ -66,6 +67,11 @@ class IOManager4File : public IOManager { | |||
const IOOption& ioOpt, | |||
MDSClient* mdsclient); | |||
|
|||
/** | |||
* 析构,回收资源 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
english
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
* Project Idea #### Developer Experience ##### Kruise Deployment Wizard Proposed Project * Update project to focus on Argo Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com> Co-authored-by: Chris Aniszczyk <caniszczyk@gmail.com>
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
What is changed and how it works?
What's Changed:
How it Works:
Side effects(Breaking backward compatibility? Performance regression?):
Check List