-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Expose an API to clean up async data cache on node #24530
base: master
Are you sure you want to change the base?
[native] Expose an API to clean up async data cache on node #24530
Conversation
if (nodeState() == NodeState::kActive) { | ||
auto* asyncDataCache = velox::cache::AsyncDataCache::getInstance(); | ||
if (asyncDataCache != nullptr) { | ||
asyncDataCache->clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a VELOX_CHECK in clear() that might fail. Shall we catch it, log the failure and rethrow?
void AsyncDataCache::clear() {
for (auto& shard : shards_) {
memory::Allocation unused;
shard->evict(std::numeric_limits<uint64_t>::max(), true, 0, unused);
VELOX_CHECK(unused.empty());
}
}
LOG(INFO) << "async data cache clean up is successful"; | ||
} | ||
else { | ||
LOG(ERROR) << "Issue in async data cache clean up"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be more specific, and just say "Cannot acquire the AsyncDataCache instance"
nativeQueryRunnerParameters.workerCount, | ||
cacheMaxSize, | ||
DEFAULT_STORAGE_FORMAT, | ||
true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we setting addStorageFormatToPath to true here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added this to add storage name in the data folder path.
assertEquals(0, finalMetrics.entries, "Cache should be empty after cleanup."); | ||
} | ||
|
||
private Metrics collectCacheMetrics(Set<InternalNode> workerNodes, DistributedQueryRunner distributedQueryRunner, String endpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distributedQueryRunner is not used
int hits = 0; | ||
int entries = 0; | ||
for (InternalNode worker : workerNodes) { | ||
Map<String, Long> metrics = fetchMetrics(worker.getInternalUri().toString(), endpoint, "GET"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this function only supports scalar integer type metrics. How do you plan to collect histogram type metics in the future? Will you add new logic to this method or create another method? If it's the latter, maybe it'll be clearer to rename this one to fetchScalarLongMetrics
?
@@ -121,7 +139,7 @@ public static QueryRunner createQueryRunner( | |||
|
|||
defaultQueryRunner.close(); | |||
|
|||
return createNativeQueryRunner(dataDirectory.get().toString(), prestoServerPath.get(), workerCount, cacheMaxSize, true, Optional.empty(), storageFormat, addStorageFormatToPath, false, isCoordinatorSidecarEnabled, false); | |||
return createNativeQueryRunner(dataDirectory.get().toString(), prestoServerPath.get(), workerCount, cacheMaxSize, true, Optional.empty(), storageFormat, addStorageFormatToPath, false, isCoordinatorSidecarEnabled, false, enableRuntimeMetricsCollection); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is too long
f47d63b
to
eba0f8a
Compare
What is the use case to clear the cache? we already have a pushback mechanism now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @agrawalreetika.
@@ -211,6 +211,10 @@ class PrestoServer { | |||
|
|||
void reportNodeStatus(proxygen::ResponseHandler* downstream); | |||
|
|||
void cleanAsynDataCache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cleanAsyncDataCache
false); | ||
} | ||
|
||
public static QueryRunner createQueryRunner(boolean enableRuntimeMetricsCollection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be better to add this parameter to the existing createQueryRunner
function:
public static QueryRunner createQueryRunner(boolean addStorageFormatToPath, boolean isCoordinatorSidecarEnabled)
int entries = 0; | ||
for (InternalNode worker : workerNodes) { | ||
Map<String, Long> metrics = fetchScalarLongMetrics(worker.getInternalUri().toString(), endpoint, "GET"); | ||
hits += metrics.getOrDefault("velox_memory_cache_num_hits", 0L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the configs velox_memory_cache_num_hits
and velox_memory_cache_num_entries
defined?
@@ -331,6 +331,14 @@ void PrestoServer::run() { | |||
proxygen::ResponseHandler* downstream) { | |||
server->reportMemoryInfo(downstream); | |||
}); | |||
httpServer_->registerPut( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the v1/memory
endpoint overloaded? Might be better to add a new endpoint v1/memory/clear
Description
Expose an API to clean up the async data cache on the node
Motivation and Context
Expose an API to clean up the async data cache on the node
Impact
New API addition -
Test Plan
Test Added
Contributor checklist
Release Notes