-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record source information of HBO stats #22234
Conversation
7f1e575
to
b576187
Compare
Does it make sense to encode the environment and version? The environment could identify the deployment that wrote the stats (including the eval engine used), and the version could be used to identify any anomalies introduced in the metrics that may vary between version. Similar to how we embed the version in things like Parquet file metadata. |
3c1aa54
to
6de650a
Compare
Currently I plan to include the type of workers and the query ID of the queries which produce the history statistics. I do not see immediate need of adding environment and version, and they can be inferred from query ID with proper logging. We can add them later if needed, just I do not see immediate need for now. |
82abb6d
to
01f9257
Compare
{ | ||
requireNonNull(objectMapper, "objectMapper is null"); | ||
this.sessionPropertyManager = requireNonNull(sessionPropertyManager, "sessionPropertyManager is null"); | ||
this.historyBasedStatisticsCacheManager = new HistoryBasedStatisticsCacheManager(); | ||
ObjectMapper newObjectMapper = objectMapper.copy().configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true); | ||
this.planCanonicalInfoProvider = new CachingPlanCanonicalInfoProvider(historyBasedStatisticsCacheManager, newObjectMapper, metadata); | ||
this.config = requireNonNull(config, "config is null"); | ||
this.isNativeExecution = featuresConfig.isNativeExecutionEnabled(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if it's native execution, and record the information when writing stats to HBO
if (predictedPlanStatistics.getConfidence() > 0) { | ||
return delegateStats.combineStats( | ||
predictedPlanStatistics, | ||
new HistoryBasedSourceInfo(entry.getKey().getHash(), inputTableStatistics, Optional.of(historicalPlanStatisticsEntry.get().getHistoricalPlanStatisticsEntryInfo()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the source information to plan statistics returned from HBO.
HistoricalPlanStatisticsEntryInfo historicalPlanStatisticsEntryInfo = new HistoricalPlanStatisticsEntryInfo( | ||
isNativeExecution ? HistoricalPlanStatisticsEntryInfo.WorkerType.CPP : HistoricalPlanStatisticsEntryInfo.WorkerType.JAVA, queryInfo.getQueryId()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record the worker type and query ID when recording the HBO stats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get this isNativeExecution
property via session directly? queryInfo.getSession().toSession(sessionPropertyManager);
That way you wound not need to inject featuresConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaystarshot Jay, we explicitly removed this property from the sessions because it doesn't make sense to allow this to be modified for individual queries. This is a cluster-wide property. I have a PR to actually delete it: #22183
CC: @tdcmeehan
I like the idea of recording the version. That way if there's a problem with stats from some release or some big change in an operator that would make old stats not relevant, we could programatically exclude those stats from being used. |
01f9257
to
8df83cc
Compare
{ | ||
requireNonNull(objectMapper, "objectMapper is null"); | ||
this.sessionPropertyManager = requireNonNull(sessionPropertyManager, "sessionPropertyManager is null"); | ||
this.historyBasedStatisticsCacheManager = new HistoryBasedStatisticsCacheManager(); | ||
ObjectMapper newObjectMapper = objectMapper.copy().configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true); | ||
this.planCanonicalInfoProvider = new CachingPlanCanonicalInfoProvider(historyBasedStatisticsCacheManager, newObjectMapper, metadata); | ||
this.config = requireNonNull(config, "config is null"); | ||
this.isNativeExecution = featuresConfig.isNativeExecutionEnabled(); | ||
this.serverVersion = requireNonNull(nodeVersion, "nodeVersion is null").toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add information for server version
Add server version per suggestion. |
8df83cc
to
16b95c8
Compare
...-spi/src/main/java/com/facebook/presto/spi/statistics/HistoricalPlanStatisticsEntryInfo.java
Outdated
Show resolved
Hide resolved
16b95c8
to
43593e1
Compare
Record the type of workers (CPP, JAVA) and query ID of the queries which produce these stats.
43593e1
to
3aa03b5
Compare
Description
This PR records more information about HBO stats, including what type of workers (currently c++ and Java) are these stats from, and the query ID which generates these stats.
Motivation and Context
Adding worker type because HBO tracks the size of operator output, however this size can be dependent on the data structure used and compaction algorithm when used. Hence it's expected that presto java and presto c++ can report different size. We need to log this information in HBO stats.
Adding query ID is for debugging purpose. This can help to identify the query which populates the stats quickly.
Impact
Improve on HBO stats to make it more precise and easier to debug.
Test Plan
End to end test locally to make sure these stats are available in logging.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.