Improve performance of data generation #34

wendigo · 2023-04-13T17:19:46Z

For even a simple query performance is much much better:

After:

trino> select lower(orderstatus), count(1) from tpch.sf100.orders group by lower(orderstatus);
 _col0 |  _col1
-------+----------
 p     |  3841445
 o     | 73086053
 f     | 73072502
(3 rows)

Query 20230413_162027_00025_v32u9, FINISHED, 3 nodes
Splits: 54 total, 54 done (100.00%)
9.38 [150M rows, 0B] [16M rows/s, 0B/s]

Before:

trino> select lower(orderstatus), count(1) from tpch.sf100.orders group by lower(orderstatus);
 _col0 |  _col1
-------+----------
 p     |  3841445
 o     | 73086053
 f     | 73072502
(3 rows)

Query 20230413_171901_00025_x7epn, FINISHED, 3 nodes
Splits: 54 total, 54 done (100.00%)
15.32 [150M rows, 0B] [9.79M rows/s, 0B/s]

Improve performance of data generation

ec01365

cla-bot bot added the cla-signed label Apr 13, 2023

martint approved these changes Apr 13, 2023

View reviewed changes

martint merged commit 4069e01 into trinodb:master Apr 14, 2023

wendigo deleted the serafin/faster-generation branch April 14, 2023 05:01

wendigo mentioned this pull request Apr 14, 2023

Improve performance of data generation #36

Merged

wendigo mentioned this pull request Aug 30, 2023

Multi-join pushdown appends recursively _<number> to column names so that they exceed maximum alias length trinodb/trino#18642

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of data generation #34

Improve performance of data generation #34

wendigo commented Apr 13, 2023

Improve performance of data generation #34

Improve performance of data generation #34

Conversation

wendigo commented Apr 13, 2023