-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stats report daemon to Velox #9653
Conversation
✅ Deploy Preview for meta-velox canceled.
|
e6473b4
to
63e6cf3
Compare
@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
63e6cf3
to
96b73b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanjialiang thanks for the change % minors.
// Some cumulative metrics needs this previous state to calculate the delta | ||
// for reporting. | ||
std::map<std::string, MetricUnion> prevSimpleStatsHolder_; | ||
velox::memory::MemoryArbitrator::Stats prevArbitratorStats_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/prevArbitratorStats_/lastArbitratorStats_/
velox::memory::MemoryArbitrator::Stats prevArbitratorStats_; | ||
folly::ThreadedRepeatingFunctionRunner scheduler_; | ||
}; | ||
} // namespace facebook::velox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide global utility to start and stop the singleton background daemon? We can do that in followup as well. Thanks!
} | ||
|
||
void PeriodicStatsReportDaemon::addLowFrequencyReports() { | ||
static constexpr uint64_t kLowFrequencyReportIntervalMs{60'000}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just let each component decides its own frequency and register separately? Thanks!
memoryCacheStats.hitBytes; | ||
prevSimpleStatsHolder_[kLastMemoryCacheInserts].int64Value = | ||
memoryCacheStats.numNew; | ||
prevSimpleStatsHolder_[kLastMemoryCacheEvictions].int64Value = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we just record the previous raw stats instead of using prevSimpleStatsHolder_? Thanks!
VELOX_CHECK_GE(updatedArbitratorStats, prevArbitratorStats_); | ||
const auto deltaArbitratorStats = | ||
updatedArbitratorStats - prevArbitratorStats_; | ||
REPORT_IF_NOT_ZERO( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check SharedArbitrator code? I think most of metrics have already been reported. And we only need to report the average arbitrator free spaces here? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synced offline. Will followup remove the duplicated reported metrics in arbitrator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do in this PR? We can't duplicate the report from velox side.
const velox::cache::AsyncDataCache* const asyncDataCache, | ||
const velox::memory::MemoryArbitrator* const arbitrator) | ||
: memoryAllocator_(memoryAllocator), | ||
asyncDataCache_(asyncDataCache), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/asyncDataCache_/cache_/
96b73b9
to
585b8b4
Compare
class PeriodicStatsReportDaemon { | ||
public: | ||
PeriodicStatsReportDaemon( | ||
const velox::memory::MemoryAllocator* const memoryAllocator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const velox::memory::MemoryAllocator* memoryAllocator
dittos
public: | ||
PeriodicStatsReportDaemon( | ||
const velox::memory::MemoryAllocator* const memoryAllocator, | ||
const velox::cache::AsyncDataCache* const asyncDataCache, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
s/memoryAllocator/allocator/
s/asyncDataCache/cache/
private: | ||
// Add a task to run periodically. | ||
template <typename TFunc> | ||
void addTask(TFunc&& func, size_t periodMicros, const std::string& taskName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/periodMicros/peridoicIntervalUs or intervalUs/
private: | ||
// Add a task to run periodically. | ||
template <typename TFunc> | ||
void addTask(TFunc&& func, size_t periodMicros, const std::string& taskName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put taskName first?
name, func, periodicIntervalUs
LOG(ERROR) << "Error running periodic task " << taskName << ": " | ||
<< e.what(); | ||
} | ||
return std::chrono::milliseconds(periodMicros / 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is always in milliseconds, then how about just pass ms? like intervalMs?
cache::CacheStats lastCacheStats_; | ||
cache::SsdCacheStats lastSsdStats_; | ||
velox::memory::MemoryArbitrator::Stats lastArbitratorStats_; | ||
folly::ThreadedRepeatingFunctionRunner scheduler_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put scheduler_ ahead of saved last stats? Thanks!
arbitrator_(arbitrator) {} | ||
|
||
void PeriodicStatsReportDaemon::start() { | ||
addTask( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addAllocatorStatsReport();
addCacheStatsReport();
addArbitratorStatsReport()?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we keep it this way instead of creating 3 methods for 3 one-liner to keep the class a bit simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine. Leave one empty line in between. Thanks!
} | ||
} | ||
|
||
lastCacheStats_ = cacheStats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just access ssdStats from cacheStats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't. ssdStats is a shared_ptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't keep a shared_ptr?
VELOX_CHECK_GE(updatedArbitratorStats, prevArbitratorStats_); | ||
const auto deltaArbitratorStats = | ||
updatedArbitratorStats - prevArbitratorStats_; | ||
REPORT_IF_NOT_ZERO( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do in this PR? We can't duplicate the report from velox side.
@xiaoxmeng We can keep it this way and, when changing to plug this class in with presto-native, we can remove the duplicated ones from presto-native to avoid double reporting. It can be quite easy to forget to add back these particular scattered lines. What do you think? |
1cc913b
to
dd1db8a
Compare
#include "velox/common/memory/Memory.h" | ||
#include "velox/common/memory/MmapAllocator.h" | ||
|
||
namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move anonymous namespace inside velox namespace
dd1db8a
to
55ca641
Compare
0aae33e
to
05854c7
Compare
const velox::memory::MemoryAllocator* allocator, | ||
const velox::cache::AsyncDataCache* cache, | ||
const velox::memory::MemoryArbitrator* arbitrator, | ||
Options options = Options()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const Options& options
cache_(cache), | ||
arbitrator_(arbitrator), | ||
options_(options) { | ||
lastCacheStats_.ssdStats = std::make_shared<cache::SsdCacheStats>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not explicit check if ssd is null in stats update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do that, too
3782229
to
f1d8d68
Compare
velox/common/base/Counters.h
Outdated
constexpr folly::StringPiece kMetricArbitratorArbitrationTimeUs{ | ||
"velox.arbitrator_arbitration_time_us"}; | ||
|
||
constexpr folly::StringPiece kMetricArbitratorNumShrunkBytes{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's skip reporting the following three for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep the migration the same as presto-native and modify the migrated ones in followups if needed
velox/common/base/Counters.h
Outdated
@@ -128,4 +119,28 @@ constexpr folly::StringPiece kMetricSpillWriteTimeMs{ | |||
|
|||
constexpr folly::StringPiece kMetricFileWriterEarlyFlushedRawBytes{ | |||
"velox.file_writer_early_flushed_raw_bytes"}; | |||
|
|||
constexpr folly::StringPiece kMetricArbitratorNumRequests{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update document for the new ones? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counter document is in .cpp per Velox style, which is different from presto
REPORT_IF_NOT_ZERO( | ||
kMetricArbitratorNumReclaimedBytes, | ||
deltaArbitratorStats.numReclaimedBytes); | ||
REPORT_IF_NOT_ZERO( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shall report kMetricArbitratorFreeCapacityBytes and kMetricArbitratorFreeReservedCapacityBytes from the latest stats? Thanks!
7ca5de5
to
c346bc4
Compare
@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
velox/docs/monitoring/metrics.rst
Outdated
@@ -113,8 +113,8 @@ Memory Management | |||
is not sufficient. It excludes counting instances where the operator is in a | |||
non-reclaimable state due to currently being on-thread and running or being already | |||
cancelled. | |||
* - arbitrator_requests_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the original names?
Summary: The stats report daemon is used for periodically exporting velox metrics. Current supported metrics are memory related metrics. There will be followups for additional metrics. Pull Request resolved: facebookincubator#9653 Reviewed By: xiaoxmeng Differential Revision: D56690811 Pulled By: tanjialiang
This pull request was exported from Phabricator. Differential Revision: D56690811 |
Summary: The stats report daemon is used for periodically exporting velox metrics. Current supported metrics are memory related metrics. There will be followups for additional metrics. Pull Request resolved: facebookincubator#9653 Reviewed By: xiaoxmeng Differential Revision: D56690811 Pulled By: tanjialiang
This pull request was exported from Phabricator. Differential Revision: D56690811 |
@tanjialiang merged this pull request in ebcbec7. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: The stats report daemon is used for periodically exporting velox metrics. Current supported metrics are memory related metrics. There will be followups for additional metrics. Pull Request resolved: facebookincubator#9653 Reviewed By: xiaoxmeng Differential Revision: D56690811 Pulled By: tanjialiang fbshipit-source-id: e6f7236df9ea1445355f72b6f94b52704e0e1f4e
The stats report daemon is used for periodically exporting velox metrics. Current supported metrics are memory related metrics. There will be followups for additional metrics.