-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: optimize the layout of global sort files and other #48275
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #48275 +/- ##
=================================================
- Coverage 71.0083% 54.0861% -16.9222%
=================================================
Files 1367 1582 +215
Lines 404865 596180 +191315
=================================================
+ Hits 287488 322451 +34963
- Misses 97372 251125 +153753
- Partials 20005 22604 +2599
Flags with carried forward coverage won't be shown. Click here to find out more.
|
0ed326c
to
0ffe2e0
Compare
Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
} | ||
// If the reader has fewer than n bytes remaining in current buffer, | ||
// `auxBuf` is used as a container instead. | ||
if n > 1024*1024*1024 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it 1024*1024*1024
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a defensive coding, and the author of this line is @wjhuang2016 let's ask him
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,if there is something wrong we can return an error instead of panic. Just choose a large number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to do this outside of readNBytes()
and log the file name & other info.
Could you change this to 1 * size.GB
instead and add comment for it?
Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
binary.BigEndian.AppendUint64(dataBuf[:lengthBytes], uint64(len(idxVal))) | ||
keyAdapter.Encode(dataBuf[2*lengthBytes:2*lengthBytes:2*lengthBytes+encodedKeyLen], idxKey, rowID) | ||
copy(dataBuf[2*lengthBytes+encodedKeyLen:], idxVal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onefile writer should change too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the order to keyLen, valueLen, key, value in onefile_writer.go line 108~111.
Signed-off-by: lance6716 <lance6716@gmail.com>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lance6716, tangenta, YuJuncen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
2 similar comments
/retest |
/retest |
What problem does this PR solve?
Issue Number: ref #48779
Problem Summary:
What is changed and how it works?
The old layout is
keyLen: uint64, key: char[keyLen], valueLen: uint64, value: char[valueLen]
. In this PR we change it tokeyLen: uint64, valueLen: uint64, key: char[keyLen], value: char[valueLen]
. Compared with the old PR, we can merge readingkey
andvalue
into one IO and split the data afterwards, and each IO does not require the previous data buffer to be retained in memory.Check List
Tests
benchmark of merge iter reading
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.