Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of data generation #34

Merged
merged 1 commit into from
Apr 14, 2023

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Apr 13, 2023

For even a simple query performance is much much better:

After:

trino> select lower(orderstatus), count(1) from tpch.sf100.orders group by lower(orderstatus);
 _col0 |  _col1
-------+----------
 p     |  3841445
 o     | 73086053
 f     | 73072502
(3 rows)

Query 20230413_162027_00025_v32u9, FINISHED, 3 nodes
Splits: 54 total, 54 done (100.00%)
9.38 [150M rows, 0B] [16M rows/s, 0B/s]

Before:

trino> select lower(orderstatus), count(1) from tpch.sf100.orders group by lower(orderstatus);
 _col0 |  _col1
-------+----------
 p     |  3841445
 o     | 73086053
 f     | 73072502
(3 rows)

Query 20230413_171901_00025_x7epn, FINISHED, 3 nodes
Splits: 54 total, 54 done (100.00%)
15.32 [150M rows, 0B] [9.79M rows/s, 0B/s]

@cla-bot cla-bot bot added the cla-signed label Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants