-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query replayer #10897
Add query replayer #10897
Conversation
✅ Deploy Preview for meta-velox canceled.
|
a051b07
to
f2f11ee
Compare
4a7719a
to
df172db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duanmeng looks good % minors. Thanks!
489beac
to
c7bad5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duanmeng thanks for the update!
05cd44b
to
13bd5a0
Compare
13bd5a0
to
0b12aec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duanmeng thanks for the update % miniors
7ee3e0a
to
03f98ff
Compare
2d6f151
to
2219367
Compare
bfa1813
to
d781eff
Compare
velox/core/QueryConfig.h
Outdated
@@ -357,6 +357,14 @@ class QueryConfig { | |||
/// Empty string if only want to trace the query metadata. | |||
static constexpr const char* kQueryTraceNodeIds = "query_trace_node_ids"; | |||
|
|||
/// The max trace bytes limit, if it is zero, then tracing is disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are nice. Can they be extracted into a separate PR? Can we include these in the PR description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to add trace size and task limit in a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include these in the PR description?
I've updated these in the PR description.
velox/core/QueryConfig.h
Outdated
@@ -689,6 +697,16 @@ class QueryConfig { | |||
return get<std::string>(kQueryTraceNodeIds, ""); | |||
} | |||
|
|||
uint64_t queryTraceMaxBytes() const { | |||
// The default query trace bytes, 0 by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is redundant. It simply repeats the code on the next line. Let's remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, fixed.
velox/docs/configs.rst
Outdated
* - query_trace_max_bytes | ||
- integer | ||
- 0 | ||
- The max trace bytes limit, if it is zero, then tracing is disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The max trace bytes limit. Tracing is disabled if zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
issues. It helps prevent interference from the test noises in a production | ||
environment (such as storage, network etc) by allowing replay of a part of the | ||
query plan and data set in an isolated environment such as a local machine. | ||
This is much more efficient for query performance debugging as we don't have to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is useful for debugging query performance ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
- When the query starts, the task records the metadata including query plan fragment, | ||
query configuration, and connector properties. | ||
- During the query running, each traced operator records the input vectors and saves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running -> execution
Do we store each vector in a separate file? Or do we store all vectors in the same file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We store all the vectors in a single file per operator per driver, say there will be 3 files if the degree of parallelism of the traced operator is 3 (3 drivers).
query configuration, and connector properties. | ||
- During the query running, each traced operator records the input vectors and saves | ||
in the specified storage location. | ||
- The metadata are serialized using json format and operator data inputs are serialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume not all connectors support this. What happens if connector is not serializable?
Does this also apply to TableScan operator? I assume no, but I can't find any discussion about TableScan here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, only the hive connector is supported at present. We plan to support TableScan by only tracing the input splits and update the document accordingly, and the other operators as well. The operator supporting plan is listed at #9668 (comment).
- Apply the recorded query configuration and connector properties to replay the query/task | ||
with the same input and configuration setup as in production. | ||
|
||
**NOTE**: the presto serialization might lose the input vector encoding such as lazy vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Presto serialization doesn't always preserve vector encoding (lazy vectors are loaded, nested dictionaries are flattened). Hence, replay may differ from the original run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
||
The tracing framework consists of three components: | ||
|
||
1. **Query Trace Writer**: metadata writer and the data writer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata and data writer
metadata and data reader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
- Plan fragment of the task (also known as a plan node tree). It can be serialized | ||
as a JSON object, which is already supported in Velox. | ||
|
||
**QueryDataWriter** records the input vectors from the target operator, which are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this apply to TableScan operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, only input splits are traced for TableScan operator.
It is used as the utility to replay the input data as a source operator in the target | ||
operator replay. | ||
|
||
**NOTE**: `QueryDataWriter` serializes and flushes the input vectors in batches, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use this tool to replay crashes? Will the tool be able to read "partial traces" produced before the crash? CC: @xiaoxmeng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe so. We will record the plan node tree during the task creation, and record the input vectors in Operator::addInput
, so we can use the partial input data to replay the crashed query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, then we need to make sure the replay doesn't depends on the summary file.
|
||
.. code-block:: c++ | ||
|
||
query_trace_tool --root $root_dir --summary --pretty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is replay a separate tool?
I still think that showing plan as JSON is not very useful. Can we show it in a more user-friendly format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is replay a separate tool?
No, I forget to update the name, it should be query_replayer
.
I still think that showing plan as JSON is not very useful. Can we show it in a more user-friendly format?
The plan is recorded during the task creation so no stats information in it. It is a bonus that allows users to get a preliminary understanding of the tracked data before replaying the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove the plan JSON showing, and only list the traced task ids? @mbasmanova @xiaoxmeng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to allow for showing the plan, just not using JSON. Why can't we display it similar to printPlanWithStats, just without the stats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, we need to use a human-friendly way to show the plan instead of using JSON. It should be similar to the way used in the printPlanWithStats
although without the stats. Do I understand correctly? @mbasmanova
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by using queryPlan->toString(true, true)
. Now the plan showing is human-friendly as the follows :)
-- HashJoin[5][INNER c0=u0, filter: lt(ROW["c0"],135)] -> c0:BIGINT, c1:SMALLINT, c2:TINYINT
-- Project[1][expressions: (c0:BIGINT, ROW["c0"]), (c1:SMALLINT, ROW["c1"]), (c2:TINYINT, ROW["c2"])] -> c0:BIGINT, c1:SMALLINT, c2:TINYINT
-- Values[0][1 rows in 1 vectors] -> c0:BIGINT, c1:SMALLINT, c2:TINYINT, c3:VARCHAR
-- Project[4][expressions: (u0:BIGINT, ROW["c0"]), (u1:SMALLINT, ROW["c1"]), (u2:TINYINT, ROW["a0"])] -> u0:BIGINT, u1:SMALLINT, u2:TINYINT
-- Aggregation[3][SINGLE [c0, c1] a0 := min(ROW["c2"])] -> c0:BIGINT, c1:SMALLINT, a0:TINYINT
-- Values[2][1 rows in 1 vectors] -> c0:BIGINT, c1:SMALLINT, c2:TINYINT, c3:VARCHAR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova Thanks for your review, could you help take another look?
velox/core/QueryConfig.h
Outdated
@@ -357,6 +357,14 @@ class QueryConfig { | |||
/// Empty string if only want to trace the query metadata. | |||
static constexpr const char* kQueryTraceNodeIds = "query_trace_node_ids"; | |||
|
|||
/// The max trace bytes limit, if it is zero, then tracing is disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to add trace size and task limit in a follow-up PR?
velox/core/QueryConfig.h
Outdated
@@ -689,6 +697,16 @@ class QueryConfig { | |||
return get<std::string>(kQueryTraceNodeIds, ""); | |||
} | |||
|
|||
uint64_t queryTraceMaxBytes() const { | |||
// The default query trace bytes, 0 by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, fixed.
velox/docs/configs.rst
Outdated
* - query_trace_max_bytes | ||
- integer | ||
- 0 | ||
- The max trace bytes limit, if it is zero, then tracing is disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
issues. It helps prevent interference from the test noises in a production | ||
environment (such as storage, network etc) by allowing replay of a part of the | ||
query plan and data set in an isolated environment such as a local machine. | ||
This is much more efficient for query performance debugging as we don't have to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
- When the query starts, the task records the metadata including query plan fragment, | ||
query configuration, and connector properties. | ||
- During the query running, each traced operator records the input vectors and saves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We store all the vectors in a single file per operator per driver, say there will be 3 files if the degree of parallelism of the traced operator is 3 (3 drivers).
- Apply the recorded query configuration and connector properties to replay the query/task | ||
with the same input and configuration setup as in production. | ||
|
||
**NOTE**: the presto serialization might lose the input vector encoding such as lazy vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
||
The tracing framework consists of three components: | ||
|
||
1. **Query Trace Writer**: metadata writer and the data writer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
- Plan fragment of the task (also known as a plan node tree). It can be serialized | ||
as a JSON object, which is already supported in Velox. | ||
|
||
**QueryDataWriter** records the input vectors from the target operator, which are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, only input splits are traced for TableScan operator.
It is used as the utility to replay the input data as a source operator in the target | ||
operator replay. | ||
|
||
**NOTE**: `QueryDataWriter` serializes and flushes the input vectors in batches, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe so. We will record the plan node tree during the task creation, and record the input vectors in Operator::addInput
, so we can use the partial input data to replay the crashed query.
|
||
.. code-block:: c++ | ||
|
||
query_trace_tool --root $root_dir --summary --pretty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is replay a separate tool?
No, I forget to update the name, it should be query_replayer
.
I still think that showing plan as JSON is not very useful. Can we show it in a more user-friendly format?
The plan is recorded during the task creation so no stats information in it. It is a bonus that allows users to get a preliminary understanding of the tracked data before replaying the query.
edc1c41
to
78d2057
Compare
78d2057
to
c4b3f48
Compare
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng merged this pull request in cc46d81. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: Velox can record the query metadata (query plan and configs) during task creation and input vectors of the traced operator, see facebookincubator#10774 and facebookincubator#10815. This PR adds a query replayer, it can be used to replay a query locally using the metadata and input vectors from the production environment. It supports showing the summary of a query at present, and more traced operators' replaying supports will be added in the future. Also, this PR adds two query configs `query_trace_max_bytes` and `query_trace_task_reg_exp` to constraint the record input data size and trace tasks respectively to ensure the stability of the cluster in the prod. Part of facebookincubator#9668 Pull Request resolved: facebookincubator#10897 Reviewed By: tanjialiang Differential Revision: D62336733 Pulled By: xiaoxmeng fbshipit-source-id: d196738dfa92c29fe5de67a944f652a328903814
Velox can record the query metadata (query plan and configs)
during task creation and input vectors of the traced operator,
see #10774 and #10815.
This PR adds a query replayer, it can be used to replay a query locally
using the metadata and input vectors from the production environment.
It supports showing the summary of a query at present, and more traced
operators' replaying supports will be added in the future.
Also, this PR adds two query configs
query_trace_max_bytes
andquery_trace_task_reg_exp
to constraint the record input data sizeand trace tasks respectively to ensure the stability of the cluster
in the prod.
Part of #9668