Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Expose an API to clean up async data cache on node #24530

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

agrawalreetika
Copy link
Member

Description

Expose an API to clean up the async data cache on the node

Motivation and Context

Expose an API to clean up the async data cache on the node

Impact

New API addition -

curl -X PUT -d " \"CLEAN_ASYNC_DATA_CACHE\" " -H "Content-type: application/json" "http://localhost:7777/v1/memory"

Test Plan

Test Added

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

@agrawalreetika agrawalreetika requested a review from a team as a code owner February 11, 2025 10:07
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Feb 11, 2025
@prestodb-ci prestodb-ci requested review from a team, pramodsatya and NivinCS and removed request for a team February 11, 2025 10:07
@agrawalreetika agrawalreetika self-assigned this Feb 11, 2025
@agrawalreetika agrawalreetika changed the title Expose an API to clean up async data cache on node [native] Expose an API to clean up async data cache on node Feb 11, 2025
if (nodeState() == NodeState::kActive) {
auto* asyncDataCache = velox::cache::AsyncDataCache::getInstance();
if (asyncDataCache != nullptr) {
asyncDataCache->clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a VELOX_CHECK in clear() that might fail. Shall we catch it, log the failure and rethrow?

void AsyncDataCache::clear() {
  for (auto& shard : shards_) {
    memory::Allocation unused;
    shard->evict(std::numeric_limits<uint64_t>::max(), true, 0, unused);
    VELOX_CHECK(unused.empty());
  }
}

LOG(INFO) << "async data cache clean up is successful";
}
else {
LOG(ERROR) << "Issue in async data cache clean up";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more specific, and just say "Cannot acquire the AsyncDataCache instance"

nativeQueryRunnerParameters.workerCount,
cacheMaxSize,
DEFAULT_STORAGE_FORMAT,
true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we setting addStorageFormatToPath to true here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added this to add storage name in the data folder path.

assertEquals(0, finalMetrics.entries, "Cache should be empty after cleanup.");
}

private Metrics collectCacheMetrics(Set<InternalNode> workerNodes, DistributedQueryRunner distributedQueryRunner, String endpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distributedQueryRunner is not used

int hits = 0;
int entries = 0;
for (InternalNode worker : workerNodes) {
Map<String, Long> metrics = fetchMetrics(worker.getInternalUri().toString(), endpoint, "GET");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this function only supports scalar integer type metrics. How do you plan to collect histogram type metics in the future? Will you add new logic to this method or create another method? If it's the latter, maybe it'll be clearer to rename this one to fetchScalarLongMetrics?

@@ -121,7 +139,7 @@ public static QueryRunner createQueryRunner(

defaultQueryRunner.close();

return createNativeQueryRunner(dataDirectory.get().toString(), prestoServerPath.get(), workerCount, cacheMaxSize, true, Optional.empty(), storageFormat, addStorageFormatToPath, false, isCoordinatorSidecarEnabled, false);
return createNativeQueryRunner(dataDirectory.get().toString(), prestoServerPath.get(), workerCount, cacheMaxSize, true, Optional.empty(), storageFormat, addStorageFormatToPath, false, isCoordinatorSidecarEnabled, false, enableRuntimeMetricsCollection);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too long

@jaystarshot
Copy link
Member

What is the use case to clear the cache? we already have a pushback mechanism now?

Copy link
Contributor

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agrawalreetika.

@@ -211,6 +211,10 @@ class PrestoServer {

void reportNodeStatus(proxygen::ResponseHandler* downstream);

void cleanAsynDataCache(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cleanAsyncDataCache

false);
}

public static QueryRunner createQueryRunner(boolean enableRuntimeMetricsCollection)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be better to add this parameter to the existing createQueryRunner function:

public static QueryRunner createQueryRunner(boolean addStorageFormatToPath, boolean isCoordinatorSidecarEnabled)

int entries = 0;
for (InternalNode worker : workerNodes) {
Map<String, Long> metrics = fetchScalarLongMetrics(worker.getInternalUri().toString(), endpoint, "GET");
hits += metrics.getOrDefault("velox_memory_cache_num_hits", 0L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the configs velox_memory_cache_num_hits and velox_memory_cache_num_entries defined?

@@ -331,6 +331,14 @@ void PrestoServer::run() {
proxygen::ResponseHandler* downstream) {
server->reportMemoryInfo(downstream);
});
httpServer_->registerPut(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the v1/memory endpoint overloaded? Might be better to add a new endpoint v1/memory/clear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants