Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stats report daemon to Velox #9653

Closed
wants to merge 1 commit into from

Conversation

tanjialiang
Copy link
Contributor

The stats report daemon is used for periodically exporting velox metrics. Current supported metrics are memory related metrics. There will be followups for additional metrics.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 29, 2024
Copy link

netlify bot commented Apr 29, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit b7079e2
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6632b73f625e87000849bb94

@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang thanks for the change % minors.

// Some cumulative metrics needs this previous state to calculate the delta
// for reporting.
std::map<std::string, MetricUnion> prevSimpleStatsHolder_;
velox::memory::MemoryArbitrator::Stats prevArbitratorStats_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/prevArbitratorStats_/lastArbitratorStats_/

velox/common/base/PeriodicStatsReportDaemon.cpp Outdated Show resolved Hide resolved
velox::memory::MemoryArbitrator::Stats prevArbitratorStats_;
folly::ThreadedRepeatingFunctionRunner scheduler_;
};
} // namespace facebook::velox
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide global utility to start and stop the singleton background daemon? We can do that in followup as well. Thanks!

}

void PeriodicStatsReportDaemon::addLowFrequencyReports() {
static constexpr uint64_t kLowFrequencyReportIntervalMs{60'000};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just let each component decides its own frequency and register separately? Thanks!

memoryCacheStats.hitBytes;
prevSimpleStatsHolder_[kLastMemoryCacheInserts].int64Value =
memoryCacheStats.numNew;
prevSimpleStatsHolder_[kLastMemoryCacheEvictions].int64Value =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just record the previous raw stats instead of using prevSimpleStatsHolder_? Thanks!

VELOX_CHECK_GE(updatedArbitratorStats, prevArbitratorStats_);
const auto deltaArbitratorStats =
updatedArbitratorStats - prevArbitratorStats_;
REPORT_IF_NOT_ZERO(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check SharedArbitrator code? I think most of metrics have already been reported. And we only need to report the average arbitrator free spaces here? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synced offline. Will followup remove the duplicated reported metrics in arbitrator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do in this PR? We can't duplicate the report from velox side.

const velox::cache::AsyncDataCache* const asyncDataCache,
const velox::memory::MemoryArbitrator* const arbitrator)
: memoryAllocator_(memoryAllocator),
asyncDataCache_(asyncDataCache),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/asyncDataCache_/cache_/

class PeriodicStatsReportDaemon {
public:
PeriodicStatsReportDaemon(
const velox::memory::MemoryAllocator* const memoryAllocator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const velox::memory::MemoryAllocator* memoryAllocator

dittos

public:
PeriodicStatsReportDaemon(
const velox::memory::MemoryAllocator* const memoryAllocator,
const velox::cache::AsyncDataCache* const asyncDataCache,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
s/memoryAllocator/allocator/
s/asyncDataCache/cache/

private:
// Add a task to run periodically.
template <typename TFunc>
void addTask(TFunc&& func, size_t periodMicros, const std::string& taskName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/periodMicros/peridoicIntervalUs or intervalUs/

private:
// Add a task to run periodically.
template <typename TFunc>
void addTask(TFunc&& func, size_t periodMicros, const std::string& taskName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put taskName first?

name, func, periodicIntervalUs

LOG(ERROR) << "Error running periodic task " << taskName << ": "
<< e.what();
}
return std::chrono::milliseconds(periodMicros / 1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is always in milliseconds, then how about just pass ms? like intervalMs?

cache::CacheStats lastCacheStats_;
cache::SsdCacheStats lastSsdStats_;
velox::memory::MemoryArbitrator::Stats lastArbitratorStats_;
folly::ThreadedRepeatingFunctionRunner scheduler_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put scheduler_ ahead of saved last stats? Thanks!

arbitrator_(arbitrator) {}

void PeriodicStatsReportDaemon::start() {
addTask(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addAllocatorStatsReport();
addCacheStatsReport();
addArbitratorStatsReport()?

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we keep it this way instead of creating 3 methods for 3 one-liner to keep the class a bit simpler?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine. Leave one empty line in between. Thanks!

}
}

lastCacheStats_ = cacheStats;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just access ssdStats from cacheStats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't. ssdStats is a shared_ptr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't keep a shared_ptr?

VELOX_CHECK_GE(updatedArbitratorStats, prevArbitratorStats_);
const auto deltaArbitratorStats =
updatedArbitratorStats - prevArbitratorStats_;
REPORT_IF_NOT_ZERO(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do in this PR? We can't duplicate the report from velox side.

@tanjialiang
Copy link
Contributor Author

Let's do in this PR? We can't duplicate the report from velox side.

@xiaoxmeng We can keep it this way and, when changing to plug this class in with presto-native, we can remove the duplicated ones from presto-native to avoid double reporting.

It can be quite easy to forget to add back these particular scattered lines. What do you think?

@tanjialiang tanjialiang force-pushed the stats_daemon branch 2 times, most recently from 1cc913b to dd1db8a Compare April 30, 2024 05:22
#include "velox/common/memory/Memory.h"
#include "velox/common/memory/MmapAllocator.h"

namespace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move anonymous namespace inside velox namespace

@tanjialiang tanjialiang force-pushed the stats_daemon branch 2 times, most recently from 0aae33e to 05854c7 Compare April 30, 2024 06:41
const velox::memory::MemoryAllocator* allocator,
const velox::cache::AsyncDataCache* cache,
const velox::memory::MemoryArbitrator* arbitrator,
Options options = Options());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const Options& options

cache_(cache),
arbitrator_(arbitrator),
options_(options) {
lastCacheStats_.ssdStats = std::make_shared<cache::SsdCacheStats>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not explicit check if ssd is null in stats update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do that, too

velox/common/base/PeriodicStatsReporter.cpp Outdated Show resolved Hide resolved
@tanjialiang tanjialiang force-pushed the stats_daemon branch 3 times, most recently from 3782229 to f1d8d68 Compare April 30, 2024 23:13
constexpr folly::StringPiece kMetricArbitratorArbitrationTimeUs{
"velox.arbitrator_arbitration_time_us"};

constexpr folly::StringPiece kMetricArbitratorNumShrunkBytes{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's skip reporting the following three for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the migration the same as presto-native and modify the migrated ones in followups if needed

@@ -128,4 +119,28 @@ constexpr folly::StringPiece kMetricSpillWriteTimeMs{

constexpr folly::StringPiece kMetricFileWriterEarlyFlushedRawBytes{
"velox.file_writer_early_flushed_raw_bytes"};

constexpr folly::StringPiece kMetricArbitratorNumRequests{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update document for the new ones? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counter document is in .cpp per Velox style, which is different from presto

REPORT_IF_NOT_ZERO(
kMetricArbitratorNumReclaimedBytes,
deltaArbitratorStats.numReclaimedBytes);
REPORT_IF_NOT_ZERO(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shall report kMetricArbitratorFreeCapacityBytes and kMetricArbitratorFreeReservedCapacityBytes from the latest stats? Thanks!

@tanjialiang tanjialiang force-pushed the stats_daemon branch 2 times, most recently from 7ca5de5 to c346bc4 Compare May 1, 2024 00:07
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -113,8 +113,8 @@ Memory Management
is not sufficient. It excludes counting instances where the operator is in a
non-reclaimable state due to currently being on-thread and running or being already
cancelled.
* - arbitrator_requests_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the original names?

tanjialiang added a commit to tanjialiang/velox-1 that referenced this pull request May 1, 2024
Summary:
The stats report daemon is used for periodically exporting velox metrics. Current supported metrics are memory related metrics. There will be followups for additional metrics.

Pull Request resolved: facebookincubator#9653

Reviewed By: xiaoxmeng

Differential Revision: D56690811

Pulled By: tanjialiang
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56690811

Summary:
The stats report daemon is used for periodically exporting velox metrics. Current supported metrics are memory related metrics. There will be followups for additional metrics.

Pull Request resolved: facebookincubator#9653

Reviewed By: xiaoxmeng

Differential Revision: D56690811

Pulled By: tanjialiang
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56690811

@facebook-github-bot
Copy link
Contributor

@tanjialiang merged this pull request in ebcbec7.

Copy link

Conbench analyzed the 1 benchmark run on commit ebcbec72.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
Summary:
The stats report daemon is used for periodically exporting velox metrics. Current supported metrics are memory related metrics. There will be followups for additional metrics.

Pull Request resolved: facebookincubator#9653

Reviewed By: xiaoxmeng

Differential Revision: D56690811

Pulled By: tanjialiang

fbshipit-source-id: e6f7236df9ea1445355f72b6f94b52704e0e1f4e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants