[native] Expose an API to clean up async data cache on node #24530

agrawalreetika · 2025-02-11T10:07:55Z

Description

Expose an API to clean up the async data cache on the node

Motivation and Context

Expose an API to clean up the async data cache on the node

Impact

New API addition -

curl -X PUT -d " \"CLEAN_ASYNC_DATA_CACHE\" " -H "Content-type: application/json" "http://localhost:7777/v1/memory"

Test Plan

Test Added

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

== NO RELEASE NOTE ==

yingsu00 · 2025-02-11T23:20:04Z

presto-native-execution/presto_cpp/main/PrestoServer.cpp

+    if (nodeState() == NodeState::kActive) {
+      auto* asyncDataCache = velox::cache::AsyncDataCache::getInstance();
+      if (asyncDataCache != nullptr) {
+        asyncDataCache->clear();


There is a VELOX_CHECK in clear() that might fail. Shall we catch it, log the failure and rethrow?

void AsyncDataCache::clear() { for (auto& shard : shards_) { memory::Allocation unused; shard->evict(std::numeric_limits<uint64_t>::max(), true, 0, unused); VELOX_CHECK(unused.empty()); } }

yingsu00 · 2025-02-11T23:22:44Z

presto-native-execution/presto_cpp/main/PrestoServer.cpp

+        LOG(INFO) << "async data cache clean up is successful";
+      }
+      else {
+        LOG(ERROR) << "Issue in async data cache clean up";


Let's be more specific, and just say "Cannot acquire the AsyncDataCache instance"

yingsu00 · 2025-02-11T23:35:43Z

...e-execution/src/test/java/com/facebook/presto/nativeworker/PrestoNativeQueryRunnerUtils.java

+                nativeQueryRunnerParameters.workerCount,
+                cacheMaxSize,
+                DEFAULT_STORAGE_FORMAT,
+                true,


Why are we setting addStorageFormatToPath to true here?

I just added this to add storage name in the data folder path.

yingsu00 · 2025-02-11T23:50:34Z

...src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeAsyncDataCacheCleanupAPI.java

+        assertEquals(0, finalMetrics.entries, "Cache should be empty after cleanup.");
+    }
+
+    private Metrics collectCacheMetrics(Set<InternalNode> workerNodes, DistributedQueryRunner distributedQueryRunner, String endpoint)


distributedQueryRunner is not used

yingsu00 · 2025-02-12T00:01:53Z

...src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeAsyncDataCacheCleanupAPI.java

+        int hits = 0;
+        int entries = 0;
+        for (InternalNode worker : workerNodes) {
+            Map<String, Long> metrics = fetchMetrics(worker.getInternalUri().toString(), endpoint, "GET");


Seems this function only supports scalar integer type metrics. How do you plan to collect histogram type metics in the future? Will you add new logic to this method or create another method? If it's the latter, maybe it'll be clearer to rename this one to fetchScalarLongMetrics?

yingsu00 · 2025-02-12T00:05:25Z

...e-execution/src/test/java/com/facebook/presto/nativeworker/PrestoNativeQueryRunnerUtils.java

@@ -121,7 +139,7 @@ public static QueryRunner createQueryRunner(

        defaultQueryRunner.close();

-        return createNativeQueryRunner(dataDirectory.get().toString(), prestoServerPath.get(), workerCount, cacheMaxSize, true, Optional.empty(), storageFormat, addStorageFormatToPath, false, isCoordinatorSidecarEnabled, false);
+        return createNativeQueryRunner(dataDirectory.get().toString(), prestoServerPath.get(), workerCount, cacheMaxSize, true, Optional.empty(), storageFormat, addStorageFormatToPath, false, isCoordinatorSidecarEnabled, false, enableRuntimeMetricsCollection);


This line is too long

jaystarshot · 2025-02-12T18:21:25Z

What is the use case to clear the cache? we already have a pushback mechanism now?

pramodsatya

Thanks @agrawalreetika.

pramodsatya · 2025-02-13T16:13:42Z

presto-native-execution/presto_cpp/main/PrestoServer.h

@@ -211,6 +211,10 @@ class PrestoServer {

  void reportNodeStatus(proxygen::ResponseHandler* downstream);

+  void cleanAsynDataCache(


nit: cleanAsyncDataCache

pramodsatya · 2025-02-13T16:22:39Z

...e-execution/src/test/java/com/facebook/presto/nativeworker/PrestoNativeQueryRunnerUtils.java

+                false);
+    }
+
+    public static QueryRunner createQueryRunner(boolean enableRuntimeMetricsCollection)


would be better to add this parameter to the existing createQueryRunner function:

public static QueryRunner createQueryRunner(boolean addStorageFormatToPath, boolean isCoordinatorSidecarEnabled)

pramodsatya · 2025-02-13T16:28:40Z

...src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeAsyncDataCacheCleanupAPI.java

+        int entries = 0;
+        for (InternalNode worker : workerNodes) {
+            Map<String, Long> metrics = fetchScalarLongMetrics(worker.getInternalUri().toString(), endpoint, "GET");
+            hits += metrics.getOrDefault("velox_memory_cache_num_hits", 0L);


Where are the configs velox_memory_cache_num_hits and velox_memory_cache_num_entries defined?

pramodsatya · 2025-02-13T16:33:30Z

presto-native-execution/presto_cpp/main/PrestoServer.cpp

@@ -331,6 +331,14 @@ void PrestoServer::run() {
          proxygen::ResponseHandler* downstream) {
        server->reportMemoryInfo(downstream);
      });
+  httpServer_->registerPut(


Why is the v1/memory endpoint overloaded? Might be better to add a new endpoint v1/memory/clear

agrawalreetika requested a review from a team as a code owner February 11, 2025 10:07

prestodb-ci added the from:IBM PR from IBM label Feb 11, 2025

prestodb-ci requested review from a team, pramodsatya and NivinCS and removed request for a team February 11, 2025 10:07

agrawalreetika self-assigned this Feb 11, 2025

agrawalreetika requested a review from yingsu00 February 11, 2025 10:08

agrawalreetika changed the title ~~Expose an API to clean up async data cache on node~~ [native] Expose an API to clean up async data cache on node Feb 11, 2025

yingsu00 reviewed Feb 12, 2025

View reviewed changes

Expose an API to clean up async data cache on node

eba0f8a

agrawalreetika force-pushed the native-memory-cleanup branch from f47d63b to eba0f8a Compare February 12, 2025 17:33

pramodsatya reviewed Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[native] Expose an API to clean up async data cache on node #24530

[native] Expose an API to clean up async data cache on node #24530

agrawalreetika commented Feb 11, 2025

yingsu00 Feb 11, 2025

yingsu00 Feb 11, 2025

yingsu00 Feb 11, 2025

agrawalreetika Feb 12, 2025

yingsu00 Feb 11, 2025

yingsu00 Feb 12, 2025

yingsu00 Feb 12, 2025

jaystarshot commented Feb 12, 2025

pramodsatya left a comment

pramodsatya Feb 13, 2025

pramodsatya Feb 13, 2025

pramodsatya Feb 13, 2025

pramodsatya Feb 13, 2025

		@@ -211,6 +211,10 @@ class PrestoServer {

		void reportNodeStatus(proxygen::ResponseHandler* downstream);

		void cleanAsynDataCache(

[native] Expose an API to clean up async data cache on node #24530

Are you sure you want to change the base?

[native] Expose an API to clean up async data cache on node #24530

Conversation

agrawalreetika commented Feb 11, 2025

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaystarshot commented Feb 12, 2025

pramodsatya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment