Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Estimated Stats on Query Plan for Instrumentation Monitoring #18682

Merged
merged 3 commits into from
Dec 30, 2022

Conversation

fgwang7w
Copy link
Member

@fgwang7w fgwang7w commented Nov 15, 2022

Test plan - (Please fill in how you tested your changes)
Enforce a testcase to verify the plan entirety with stats info using graphviz format and json format

== RELEASE NOTES ==

General Changes
* Add estimated stats on plan output for EXPLAIN format in GRAPHVIZ and JSON

Before Change:

                                                                                                                    Query Plan
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {
   "id" : "5",
   "name" : "Output",
   "identifier" : "[key1, value1]",
   "details" : "",
   "children" : [ {
     "id" : "80",
     "name" : "RemoteStreamingExchange",
     "identifier" : "[GATHER]",
     "details" : "",
     "children" : [ {
       "id" : "0",
       "name" : "TableScan",
       "identifier" : "[TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=default, tableName=test_grouped_join1, analyzePartitionValues=Optional.empty}', layout='Optional[default.test_grouped_join1{buckets=13}]'}]",
       "details" : "LAYOUT: default.test_grouped_join1{buckets=13}\nvalue1 := value1:varchar(79):1:REGULAR (1:37)\nkey1 := key1:bigint:0:REGULAR (1:37)\n",
       "children" : [ ],
       "remoteSources" : [ ]
     } ],
     "remoteSources" : [ ]
   } ],
   "remoteSources" : [ ]
 }
(1 row)

Query 20221212_231733_00000_2kdqx, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

After Change:

                                                                                                                    Query Plan
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {
   "id" : "5",
   "name" : "Output",
   "identifier" : "[key1, value1]",
   "details" : "",
   "children" : [ {
     "id" : "80",
     "name" : "RemoteStreamingExchange",
     "identifier" : "[GATHER]",
     "details" : "",
     "children" : [ {
       "id" : "0",
       "name" : "TableScan",
       "identifier" : "[TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=default, tableName=test_grouped_join1, analyzePartitionValues=Optional.empty}', layout='Optional[default.test_grouped_join1{buckets=13}]'}]",
       "details" : "LAYOUT: default.test_grouped_join1{buckets=13}\nvalue1 := value1:varchar(79):1:REGULAR (1:37)\nkey1 := key1:bigint:0:REGULAR (1:37)\n",
       "children" : [ ],
       "remoteSources" : [ ],
       "estimates" : [ {
         "outputRowCount" : 15000.0,
         "totalSize" : 1057364.0,
         "confident" : true,
         "variableStatistics" : {
           "value1<varchar(79)>" : {
             "lowValue" : "-Infinity",
             "highValue" : "Infinity",
             "nullsFraction" : 0.0,
             "averageRowSize" : 48.49093333333333,
             "distinctValuesCount" : 15000.0
           },
           "key1<bigint>" : {
             "lowValue" : 1.0,
             "highValue" : 60000.0,
             "nullsFraction" : 0.0,
             "averageRowSize" : "NaN",
             "distinctValuesCount" : 15000.0
           }
         }
       } ]
     } ],
     "remoteSources" : [ ],
     "estimates" : [ {
       "outputRowCount" : 15000.0,
       "totalSize" : 1057364.0,
       "confident" : true,
       "variableStatistics" : {
         "value1<varchar(79)>" : {
           "lowValue" : "-Infinity",
           "highValue" : "Infinity",
           "nullsFraction" : 0.0,
           "averageRowSize" : 48.49093333333333,
           "distinctValuesCount" : 15000.0
         },
         "key1<bigint>" : {
           "lowValue" : 1.0,
           "highValue" : 60000.0,
           "nullsFraction" : 0.0,
           "averageRowSize" : "NaN",
           "distinctValuesCount" : 15000.0
         }
       }
     } ]
   } ],
   "remoteSources" : [ ],
   "estimates" : [ {
     "outputRowCount" : 15000.0,
     "totalSize" : 1057364.0,
     "confident" : true,
     "variableStatistics" : {
       "value1<varchar(79)>" : {
         "lowValue" : "-Infinity",
         "highValue" : "Infinity",
         "nullsFraction" : 0.0,
         "averageRowSize" : 48.49093333333333,
         "distinctValuesCount" : 15000.0
       },
       "key1<bigint>" : {
         "lowValue" : 1.0,
         "highValue" : 60000.0,
         "nullsFraction" : 0.0,
         "averageRowSize" : "NaN",
         "distinctValuesCount" : 15000.0
       }
     }
   } ]
 }
(1 row)

@fgwang7w fgwang7w changed the title Enable estimated stats on plan output for format in graphviz and json Enable estimated stats on query plan for instrumentation monitoring Nov 16, 2022
@fgwang7w fgwang7w changed the title Enable estimated stats on query plan for instrumentation monitoring Enable Estimated Stats on Query Plan for Instrumentation Monitoring Nov 16, 2022
@fgwang7w fgwang7w marked this pull request as ready for review November 16, 2022 17:22
@fgwang7w fgwang7w requested a review from a team as a code owner November 16, 2022 17:22
@fgwang7w fgwang7w requested a review from presto-oss November 16, 2022 17:22
@fgwang7w
Copy link
Member Author

@aaneja could you please help give a first pass on this PR? Thanks!

@v-jizhang
Copy link
Contributor

@bot kick off tests

Copy link
Contributor

@aaneja aaneja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, but LGTM otherwise

@fgwang7w fgwang7w force-pushed the graphvizstats branch 5 times, most recently from 93ac48d to 7db18ba Compare December 6, 2022 01:52
@fgwang7w fgwang7w requested a review from aaneja December 6, 2022 01:59
@fgwang7w fgwang7w force-pushed the graphvizstats branch 2 times, most recently from 769bfad to bf82e3b Compare December 6, 2022 04:50
@yingsu00 yingsu00 self-requested a review December 9, 2022 04:54
Copy link
Contributor

@aaneja aaneja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments

@fgwang7w fgwang7w force-pushed the graphvizstats branch 2 times, most recently from e6eea28 to 9ac09ab Compare December 12, 2022 03:11
Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fgwang7w George, Could you paste some sample of the plan output for both before and after?

@yingsu00 yingsu00 self-assigned this Dec 12, 2022
@fgwang7w
Copy link
Member Author

@yingsu00 thank you for the review. I have updated the before/after change. All comments are addressed. Could you please help take a look again?

@fgwang7w fgwang7w force-pushed the graphvizstats branch 3 times, most recently from 885e8e4 to 3f4328c Compare December 14, 2022 02:07
@fgwang7w fgwang7w force-pushed the graphvizstats branch 2 times, most recently from c05c968 to d03908d Compare December 18, 2022 05:39
@fgwang7w
Copy link
Member Author

Thank you @yingsu00 for helping review the PR. All comments are addressed. could you please help give another pass? Thanks!

@fgwang7w
Copy link
Member Author

@prestodb/committers ping!

@fgwang7w fgwang7w closed this Dec 23, 2022
@fgwang7w fgwang7w reopened this Dec 23, 2022
@fgwang7w fgwang7w force-pushed the graphvizstats branch 3 times, most recently from 56c65e1 to df3d362 Compare December 27, 2022 05:49
Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fgwang7w Could you please show me how you resolved the TypeManager initialization issue in presto-main/src/main/java/com/facebook/presto/sql/Serialization.java?

@@ -107,7 +107,15 @@ public void serialize(VariableReferenceExpression value, JsonGenerator jsonGener
public static class VariableReferenceExpressionDeserializer
extends KeyDeserializer
{
private final TypeManager typeManager;
// when it is on the test code path, jackson deserializer instantiates the default constructor with no arg,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this paragraph still valid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix to avoid using createTestFunctionAndTypeManager in Serialization main code path is to create a json mapper provider using the SerDe function for VariableReferenceExpression and use this custom json codec for serialization. This does not need any special no-args handling on the testing code path, e.g. TestExplainVerification, the planCodec is bind with JsonObjectMapperProvider that uses VariableReferenceExpressionDeserializer.

@fgwang7w
Copy link
Member Author

tests all green. Many thanks for reviewing and approving this PR. @yingsu00 @aaneja

@yingsu00 yingsu00 merged commit 94c1d72 into prestodb:master Dec 30, 2022
@fgwang7w fgwang7w deleted the graphvizstats branch December 30, 2022 05:30
@wanglinsong wanglinsong mentioned this pull request Jan 12, 2023
30 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants