Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query replayer #10897

Closed
wants to merge 1 commit into from
Closed

Conversation

duanmeng
Copy link
Collaborator

@duanmeng duanmeng commented Aug 30, 2024

Velox can record the query metadata (query plan and configs)
during task creation and input vectors of the traced operator,
see #10774 and #10815.

This PR adds a query replayer, it can be used to replay a query locally
using the metadata and input vectors from the production environment.
It supports showing the summary of a query at present, and more traced
operators' replaying supports will be added in the future.

Also, this PR adds two query configs query_trace_max_bytes and
query_trace_task_reg_exp to constraint the record input data size
and trace tasks respectively to ensure the stability of the cluster
in the prod.

Part of #9668

@duanmeng duanmeng requested a review from xiaoxmeng August 30, 2024 04:24
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 30, 2024
@duanmeng duanmeng marked this pull request as draft August 30, 2024 04:24
Copy link

netlify bot commented Aug 30, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit c4b3f48
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66f0ccc0bf45bc00083a72ce

@duanmeng duanmeng force-pushed the trace_tool branch 4 times, most recently from a051b07 to f2f11ee Compare August 31, 2024 11:47
@duanmeng duanmeng changed the title Add trace tool Add query trace tool Aug 31, 2024
@duanmeng duanmeng marked this pull request as ready for review August 31, 2024 11:52
@duanmeng duanmeng force-pushed the trace_tool branch 9 times, most recently from 4a7719a to df172db Compare August 31, 2024 15:30
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng looks good % minors. Thanks!

velox/exec/trace/QueryTraceUtil.h Outdated Show resolved Hide resolved
velox/exec/trace/QueryTraceUtil.cpp Outdated Show resolved Hide resolved
velox/exec/trace/QueryTraceUtil.cpp Outdated Show resolved Hide resolved
velox/exec/trace/QueryTraceUtil.cpp Show resolved Hide resolved
velox/exec/trace/QueryTraceUtil.cpp Outdated Show resolved Hide resolved
velox/tool/QueryTraceToolBase.cpp Outdated Show resolved Hide resolved
velox/tool/QueryTraceToolBase.cpp Outdated Show resolved Hide resolved
velox/tool/QueryTraceToolBase.cpp Outdated Show resolved Hide resolved
velox/exec/trace/QueryTraceUtil.h Outdated Show resolved Hide resolved
velox/tool/QueryTraceTool.cpp Outdated Show resolved Hide resolved
@duanmeng duanmeng force-pushed the trace_tool branch 4 times, most recently from 489beac to c7bad5b Compare September 1, 2024 04:47
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng thanks for the update!

velox/exec/trace/QueryTraceUtil.cpp Outdated Show resolved Hide resolved
velox/tool/trace/QueryTraceTool.cpp Outdated Show resolved Hide resolved
velox/tool/trace/QueryTraceTool.cpp Outdated Show resolved Hide resolved
velox/tool/trace/QueryTraceTool.cpp Outdated Show resolved Hide resolved
velox/tool/trace/QueryTraceReplayer.cpp Outdated Show resolved Hide resolved
velox/tool/trace/QueryTraceReplayer.cpp Outdated Show resolved Hide resolved
velox/tool/trace/QueryTraceReplayer.cpp Show resolved Hide resolved
velox/tool/trace/QueryTraceReplayer.cpp Outdated Show resolved Hide resolved
velox/tool/trace/QueryTraceReplayer.cpp Outdated Show resolved Hide resolved
velox/exec/trace/test/QueryTraceTest.cpp Show resolved Hide resolved
@duanmeng duanmeng force-pushed the trace_tool branch 2 times, most recently from 05cd44b to 13bd5a0 Compare September 18, 2024 03:46
velox/core/QueryCtx.cpp Show resolved Hide resolved
velox/core/QueryCtx.cpp Outdated Show resolved Hide resolved
velox/exec/trace/QueryDataWriter.cpp Outdated Show resolved Hide resolved
velox/exec/trace/QueryDataWriter.cpp Outdated Show resolved Hide resolved
velox/exec/trace/QueryTraceTraits.h Outdated Show resolved Hide resolved
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng thanks for the update % miniors

velox/exec/trace/QueryDataWriter.cpp Show resolved Hide resolved
velox/exec/trace/QueryDataWriter.h Show resolved Hide resolved
velox/exec/Task.cpp Show resolved Hide resolved
velox/exec/Task.cpp Outdated Show resolved Hide resolved
@duanmeng duanmeng force-pushed the trace_tool branch 3 times, most recently from 7ee3e0a to 03f98ff Compare September 20, 2024 04:04
velox/core/QueryConfig.h Outdated Show resolved Hide resolved
velox/core/QueryConfig.h Outdated Show resolved Hide resolved
velox/core/QueryConfig.h Outdated Show resolved Hide resolved
velox/core/QueryConfig.h Show resolved Hide resolved
velox/core/QueryCtx.cpp Outdated Show resolved Hide resolved
@duanmeng duanmeng force-pushed the trace_tool branch 2 times, most recently from 2d6f151 to 2219367 Compare September 20, 2024 06:24
@duanmeng duanmeng force-pushed the trace_tool branch 3 times, most recently from bfa1813 to d781eff Compare September 20, 2024 08:22
@@ -357,6 +357,14 @@ class QueryConfig {
/// Empty string if only want to trace the query metadata.
static constexpr const char* kQueryTraceNodeIds = "query_trace_node_ids";

/// The max trace bytes limit, if it is zero, then tracing is disabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are nice. Can they be extracted into a separate PR? Can we include these in the PR description?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to add trace size and task limit in a follow-up PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include these in the PR description?

I've updated these in the PR description.

@@ -689,6 +697,16 @@ class QueryConfig {
return get<std::string>(kQueryTraceNodeIds, "");
}

uint64_t queryTraceMaxBytes() const {
// The default query trace bytes, 0 by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is redundant. It simply repeats the code on the next line. Let's remove.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, fixed.

* - query_trace_max_bytes
- integer
- 0
- The max trace bytes limit, if it is zero, then tracing is disabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The max trace bytes limit. Tracing is disabled if zero.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

issues. It helps prevent interference from the test noises in a production
environment (such as storage, network etc) by allowing replay of a part of the
query plan and data set in an isolated environment such as a local machine.
This is much more efficient for query performance debugging as we don't have to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful for debugging query performance ..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


- When the query starts, the task records the metadata including query plan fragment,
query configuration, and connector properties.
- During the query running, each traced operator records the input vectors and saves
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running -> execution

Do we store each vector in a separate file? Or do we store all vectors in the same file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We store all the vectors in a single file per operator per driver, say there will be 3 files if the degree of parallelism of the traced operator is 3 (3 drivers).

query configuration, and connector properties.
- During the query running, each traced operator records the input vectors and saves
in the specified storage location.
- The metadata are serialized using json format and operator data inputs are serialized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume not all connectors support this. What happens if connector is not serializable?

Does this also apply to TableScan operator? I assume no, but I can't find any discussion about TableScan here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only the hive connector is supported at present. We plan to support TableScan by only tracing the input splits and update the document accordingly, and the other operators as well. The operator supporting plan is listed at #9668 (comment).

- Apply the recorded query configuration and connector properties to replay the query/task
with the same input and configuration setup as in production.

**NOTE**: the presto serialization might lose the input vector encoding such as lazy vector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Presto serialization doesn't always preserve vector encoding (lazy vectors are loaded, nested dictionaries are flattened). Hence, replay may differ from the original run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


The tracing framework consists of three components:

1. **Query Trace Writer**: metadata writer and the data writer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata and data writer

metadata and data reader

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

- Plan fragment of the task (also known as a plan node tree). It can be serialized
as a JSON object, which is already supported in Velox.

**QueryDataWriter** records the input vectors from the target operator, which are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this apply to TableScan operator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only input splits are traced for TableScan operator.

It is used as the utility to replay the input data as a source operator in the target
operator replay.

**NOTE**: `QueryDataWriter` serializes and flushes the input vectors in batches,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use this tool to replay crashes? Will the tool be able to read "partial traces" produced before the crash? CC: @xiaoxmeng

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe so. We will record the plan node tree during the task creation, and record the input vectors in Operator::addInput, so we can use the partial input data to replay the crashed query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, then we need to make sure the replay doesn't depends on the summary file.


.. code-block:: c++

query_trace_tool --root $root_dir --summary --pretty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is replay a separate tool?

I still think that showing plan as JSON is not very useful. Can we show it in a more user-friendly format?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is replay a separate tool?

No, I forget to update the name, it should be query_replayer.

I still think that showing plan as JSON is not very useful. Can we show it in a more user-friendly format?

The plan is recorded during the task creation so no stats information in it. It is a bonus that allows users to get a preliminary understanding of the tracked data before replaying the query.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the plan JSON showing, and only list the traced task ids? @mbasmanova @xiaoxmeng

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to allow for showing the plan, just not using JSON. Why can't we display it similar to printPlanWithStats, just without the stats?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, we need to use a human-friendly way to show the plan instead of using JSON. It should be similar to the way used in the printPlanWithStats although without the stats. Do I understand correctly? @mbasmanova

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by using queryPlan->toString(true, true). Now the plan showing is human-friendly as the follows :)

-- HashJoin[5][INNER c0=u0, filter: lt(ROW["c0"],135)] -> c0:BIGINT, c1:SMALLINT, c2:TINYINT
  -- Project[1][expressions: (c0:BIGINT, ROW["c0"]), (c1:SMALLINT, ROW["c1"]), (c2:TINYINT, ROW["c2"])] -> c0:BIGINT, c1:SMALLINT, c2:TINYINT
    -- Values[0][1 rows in 1 vectors] -> c0:BIGINT, c1:SMALLINT, c2:TINYINT, c3:VARCHAR
  -- Project[4][expressions: (u0:BIGINT, ROW["c0"]), (u1:SMALLINT, ROW["c1"]), (u2:TINYINT, ROW["a0"])] -> u0:BIGINT, u1:SMALLINT, u2:TINYINT
    -- Aggregation[3][SINGLE [c0, c1] a0 := min(ROW["c2"])] -> c0:BIGINT, c1:SMALLINT, a0:TINYINT
      -- Values[2][1 rows in 1 vectors] -> c0:BIGINT, c1:SMALLINT, c2:TINYINT, c3:VARCHAR

Copy link
Collaborator Author

@duanmeng duanmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova Thanks for your review, could you help take another look?

@@ -357,6 +357,14 @@ class QueryConfig {
/// Empty string if only want to trace the query metadata.
static constexpr const char* kQueryTraceNodeIds = "query_trace_node_ids";

/// The max trace bytes limit, if it is zero, then tracing is disabled.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to add trace size and task limit in a follow-up PR?

@@ -689,6 +697,16 @@ class QueryConfig {
return get<std::string>(kQueryTraceNodeIds, "");
}

uint64_t queryTraceMaxBytes() const {
// The default query trace bytes, 0 by default.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, fixed.

* - query_trace_max_bytes
- integer
- 0
- The max trace bytes limit, if it is zero, then tracing is disabled.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

issues. It helps prevent interference from the test noises in a production
environment (such as storage, network etc) by allowing replay of a part of the
query plan and data set in an isolated environment such as a local machine.
This is much more efficient for query performance debugging as we don't have to
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


- When the query starts, the task records the metadata including query plan fragment,
query configuration, and connector properties.
- During the query running, each traced operator records the input vectors and saves
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We store all the vectors in a single file per operator per driver, say there will be 3 files if the degree of parallelism of the traced operator is 3 (3 drivers).

- Apply the recorded query configuration and connector properties to replay the query/task
with the same input and configuration setup as in production.

**NOTE**: the presto serialization might lose the input vector encoding such as lazy vector
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


The tracing framework consists of three components:

1. **Query Trace Writer**: metadata writer and the data writer.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

- Plan fragment of the task (also known as a plan node tree). It can be serialized
as a JSON object, which is already supported in Velox.

**QueryDataWriter** records the input vectors from the target operator, which are
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only input splits are traced for TableScan operator.

It is used as the utility to replay the input data as a source operator in the target
operator replay.

**NOTE**: `QueryDataWriter` serializes and flushes the input vectors in batches,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe so. We will record the plan node tree during the task creation, and record the input vectors in Operator::addInput, so we can use the partial input data to replay the crashed query.


.. code-block:: c++

query_trace_tool --root $root_dir --summary --pretty
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is replay a separate tool?

No, I forget to update the name, it should be query_replayer.

I still think that showing plan as JSON is not very useful. Can we show it in a more user-friendly format?

The plan is recorded during the task creation so no stats information in it. It is a bonus that allows users to get a preliminary understanding of the tracked data before replaying the query.

@duanmeng duanmeng force-pushed the trace_tool branch 3 times, most recently from edc1c41 to 78d2057 Compare September 22, 2024 10:11
@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in cc46d81.

Copy link

Conbench analyzed the 1 benchmark run on commit cc46d81e.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

athmaja-n pushed a commit to athmaja-n/velox that referenced this pull request Jan 10, 2025
Summary:
Velox can record the query metadata (query plan and configs)
during task creation and input vectors of the traced operator,
see facebookincubator#10774 and facebookincubator#10815.

This PR adds a query replayer, it can be used to replay a query locally
using the metadata and input vectors from the production environment.
It supports showing the summary of a query at present, and more traced
operators' replaying supports will be added in the future.

Also, this PR adds two query configs `query_trace_max_bytes` and
`query_trace_task_reg_exp` to constraint the record input data size
and trace tasks respectively to ensure the stability of the cluster
in the prod.

Part of facebookincubator#9668

Pull Request resolved: facebookincubator#10897

Reviewed By: tanjialiang

Differential Revision: D62336733

Pulled By: xiaoxmeng

fbshipit-source-id: d196738dfa92c29fe5de67a944f652a328903814
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants