Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky TestIcebergParquet*TpcdsCostPlan for Q85 #18702

Closed
martint opened this issue Aug 16, 2023 · 4 comments · Fixed by #18880
Closed

Flaky TestIcebergParquet*TpcdsCostPlan for Q85 #18702

martint opened this issue Aug 16, 2023 · 4 comments · Fixed by #18880

Comments

@martint
Copy link
Member

martint commented Aug 16, 2023

It fails locally with the following difference:

--- expected.sql	2023-08-16 12:11:19
+++ actual.sql	2023-08-16 12:11:28
@@ -6,7 +6,7 @@
                     partial aggregation over (r_reason_desc)
                         join (INNER, REPLICATED):
                             join (INNER, REPLICATED):
-                                dynamic filter (["cd_demo_sk_6", "cd_education_status_9", "cd_marital_status_8"])
+                                dynamic filter (["cd_demo_sk", "cd_education_status", "cd_marital_status"])
                                     scan customer_demographics
                                 local exchange (GATHER, SINGLE, [])
                                     remote exchange (REPLICATE, BROADCAST, [])
@@ -28,7 +28,7 @@
                                                                                 scan date_dim
                                                                 local exchange (GATHER, SINGLE, [])
                                                                     remote exchange (REPARTITION, HASH, ["wr_item_sk", "wr_order_number"])
-                                                                        dynamic filter (["wr_reason_sk", "wr_refunded_cdemo_sk"])
+                                                                        dynamic filter (["wr_reason_sk", "wr_returning_cdemo_sk"])
                                                                             scan web_returns
                                                             local exchange (GATHER, SINGLE, [])
                                                                 remote exchange (REPLICATE, BROADCAST, [])

Expected:

local exchange (GATHER, SINGLE, [])
    remote exchange (GATHER, SINGLE, [])
        final aggregation over (r_reason_desc)
            local exchange (GATHER, SINGLE, [])
                remote exchange (REPARTITION, HASH, ["r_reason_desc"])
                    partial aggregation over (r_reason_desc)
                        join (INNER, REPLICATED):
                            join (INNER, REPLICATED):
                                dynamic filter (["cd_demo_sk_6", "cd_education_status_9", "cd_marital_status_8"])
                                    scan customer_demographics
                                local exchange (GATHER, SINGLE, [])
                                    remote exchange (REPLICATE, BROADCAST, [])
                                        join (INNER, REPLICATED):
                                            join (INNER, PARTITIONED):
                                                remote exchange (REPARTITION, HASH, ["ca_address_sk"])
                                                    dynamic filter (["ca_address_sk"])
                                                        scan customer_address
                                                local exchange (GATHER, SINGLE, [])
                                                    remote exchange (REPARTITION, HASH, ["wr_refunded_addr_sk"])
                                                        join (INNER, REPLICATED):
                                                            join (INNER, PARTITIONED):
                                                                remote exchange (REPARTITION, HASH, ["ws_item_sk", "ws_order_number"])
                                                                    join (INNER, REPLICATED):
                                                                        dynamic filter (["ws_item_sk", "ws_order_number", "ws_sold_date_sk", "ws_web_page_sk"])
                                                                            scan web_sales
                                                                        local exchange (GATHER, SINGLE, [])
                                                                            remote exchange (REPLICATE, BROADCAST, [])
                                                                                scan date_dim
                                                                local exchange (GATHER, SINGLE, [])
                                                                    remote exchange (REPARTITION, HASH, ["wr_item_sk", "wr_order_number"])
                                                                        dynamic filter (["wr_reason_sk", "wr_refunded_cdemo_sk"])
                                                                            scan web_returns
                                                            local exchange (GATHER, SINGLE, [])
                                                                remote exchange (REPLICATE, BROADCAST, [])
                                                                    scan customer_demographics
                                            local exchange (GATHER, SINGLE, [])
                                                remote exchange (REPLICATE, BROADCAST, [])
                                                    scan web_page
                            local exchange (GATHER, SINGLE, [])
                                remote exchange (REPLICATE, BROADCAST, [])
                                    scan reason

Actual:

local exchange (GATHER, SINGLE, [])
    remote exchange (GATHER, SINGLE, [])
        final aggregation over (r_reason_desc)
            local exchange (GATHER, SINGLE, [])
                remote exchange (REPARTITION, HASH, ["r_reason_desc"])
                    partial aggregation over (r_reason_desc)
                        join (INNER, REPLICATED):
                            join (INNER, REPLICATED):
                                dynamic filter (["cd_demo_sk", "cd_education_status", "cd_marital_status"])
                                    scan customer_demographics
                                local exchange (GATHER, SINGLE, [])
                                    remote exchange (REPLICATE, BROADCAST, [])
                                        join (INNER, REPLICATED):
                                            join (INNER, PARTITIONED):
                                                remote exchange (REPARTITION, HASH, ["ca_address_sk"])
                                                    dynamic filter (["ca_address_sk"])
                                                        scan customer_address
                                                local exchange (GATHER, SINGLE, [])
                                                    remote exchange (REPARTITION, HASH, ["wr_refunded_addr_sk"])
                                                        join (INNER, REPLICATED):
                                                            join (INNER, PARTITIONED):
                                                                remote exchange (REPARTITION, HASH, ["ws_item_sk", "ws_order_number"])
                                                                    join (INNER, REPLICATED):
                                                                        dynamic filter (["ws_item_sk", "ws_order_number", "ws_sold_date_sk", "ws_web_page_sk"])
                                                                            scan web_sales
                                                                        local exchange (GATHER, SINGLE, [])
                                                                            remote exchange (REPLICATE, BROADCAST, [])
                                                                                scan date_dim
                                                                local exchange (GATHER, SINGLE, [])
                                                                    remote exchange (REPARTITION, HASH, ["wr_item_sk", "wr_order_number"])
                                                                        dynamic filter (["wr_reason_sk", "wr_returning_cdemo_sk"])
                                                                            scan web_returns
                                                            local exchange (GATHER, SINGLE, [])
                                                                remote exchange (REPLICATE, BROADCAST, [])
                                                                    scan customer_demographics
                                            local exchange (GATHER, SINGLE, [])
                                                remote exchange (REPLICATE, BROADCAST, [])
                                                    scan web_page
                            local exchange (GATHER, SINGLE, [])
                                remote exchange (REPLICATE, BROADCAST, [])
                                    scan reason

It works fine in CI, though.

@martint
Copy link
Member Author

martint commented Aug 16, 2023

cc @raunaqmorarka

@raunaqmorarka
Copy link
Member

raunaqmorarka commented Aug 17, 2023

I tried TestIcebergParquetTpcdsCostBasedPlan and TestIcebergParquetPartitionedTpcdsCostBasedPlan with invocation counts of 100 locally but couldn't reproduce
cc: @sopel39 @Dith3r

@Dith3r
Copy link
Member

Dith3r commented Aug 17, 2023

With rebased repository to HEAD and cleared maven cache I've got failures in:

TestIcebergParquetPartitionedTpcdsCostBasedPlan
\ TestIcebergParquetPartitionedTpcdsCostBasedPlan.test[/sql/presto/tpcds/q85.sql]
TestIcebergParquetTpcdsCostBasedPlan
 \ TestIcebergParquetTpcdsCostBasedPlan.test[/sql/presto/tpcds/q85.sql]

Failures are consistent and same as described by @martint.

@sopel39
Copy link
Member

sopel39 commented Aug 24, 2023

I've replaced HashMap, HashSet in planner with linked versions and I still get dynamic filter (["cd_demo_sk", "cd_education_status", "cd_marital_status"]) on my machine.

@Dith3r did you try to bisect which commit starts the regression?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants