-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file read ahead through AsyncDataCache #3389
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some suggestions regarding removing doneIndices
from allCoalescedLoad_
. Also the new test is not passing TSAN.
auto nextQuantum = position_ - offsetInQuantum + loadQuantum_; | ||
auto prefetchThreshold = loadQuantum_ * prefetchPct_ / 100; | ||
if (!prefetchStarted_ && offsetInQuantum + *size > prefetchThreshold && | ||
position_ - offsetInQuantum + loadQuantum_ < region_.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextQuantum < region_.length
?
// Remove the loads that were done. There can be done loads if the same | ||
// CachedBufferedInput has multiple cycles of enqueues and loads. | ||
for (int32_t i = doneIndices.size() - 1; i >= 0; --i) { | ||
allCoalescedLoads_.erase(allCoalescedLoads_.begin() + doneIndices[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be outside the looping over allCoalescedLoads_
.
Also this is more efficient if allCoalescedLoads_
is large:
std::swap(allCoalescedLoads_[doneIndices[i]], allCoalescedLoads_.back());
allCoalescedLoads_.pop_back();
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
c1c8bb0
to
fb50810
Compare
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: adds a code sample and supporting functions for using CachedBufferedInput for smart prefetching of sequential files. A SeekableInputStream is registered for each file of interest. Each time the read proceeds past a given fraction of the current load quantum, the load of the next quantum is scheduled. The load prefetches into AsyncDataCache and will be found there when moving past the end of thecurrent quantum. Prefetch will fail silently if there is no memory or if over half the cache is taken by not yet accessed prefetched data. Adds a special StreamIdentifier to denote expected sequential access. The default for no StreamIdentifier in enqueue is preloading the whole enqueued range. Adds a mode where each visited cache entry is made evictable immediately after unpinning. This allows not polluting the cache with one time large sequential accesses. Adds a test that simulates a multifile merge. Each thread has 100 files and consumes a chunk of each in turn. Pull Request resolved: facebookincubator#3389 Reviewed By: Yuhta Differential Revision: D41642368 Pulled By: oerling fbshipit-source-id: 3272aca53cc3beb7b8a5489418802d552b92441e
This pull request was exported from Phabricator. Differential Revision: D41642368 |
Summary: adds a code sample and supporting functions for using CachedBufferedInput for smart prefetching of sequential files. A SeekableInputStream is registered for each file of interest. Each time the read proceeds past a given fraction of the current load quantum, the load of the next quantum is scheduled. The load prefetches into AsyncDataCache and will be found there when moving past the end of thecurrent quantum. Prefetch will fail silently if there is no memory or if over half the cache is taken by not yet accessed prefetched data. Adds a special StreamIdentifier to denote expected sequential access. The default for no StreamIdentifier in enqueue is preloading the whole enqueued range. Adds a mode where each visited cache entry is made evictable immediately after unpinning. This allows not polluting the cache with one time large sequential accesses. Adds a test that simulates a multifile merge. Each thread has 100 files and consumes a chunk of each in turn. Pull Request resolved: facebookincubator#3389 Reviewed By: Yuhta Differential Revision: D41642368 Pulled By: oerling fbshipit-source-id: 86adf8ea400aed76f2681c8f291f3d619ea4f492
This pull request was exported from Phabricator. Differential Revision: D41642368 |
adds a code sample and supporting functions for using CachedBufferedInput for smart prefetching of sequential files. A SeekableInputStream is registered for each file of interest. Each time the read proceeds past a given fraction of the current load quantum, the load of the next quantum is scheduled. The load prefetches into AsyncDataCache and will be found there when moving past the end of thecurrent quantum. Prefetch will fail silently if there is no memory or if over half the cache is taken by not yet accessed prefetched data.
Adds a special StreamIdentifier to denote expected sequential access. The default for no StreamIdentifier in enqueue is preloading the whole enqueued range.
Adds a mode where each visited cache entry is made evictable immediately after unpinning. This allows not polluting the cache with one time large sequential accesses.
Adds a test that simulates a multifile merge. Each thread has 100 files and consumes a chunk of each in turn.