Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPIKE] Testing PR #6322 (build performance update) #6562

Open
iknox-fa opened this issue Jan 10, 2023 · 4 comments
Open

[SPIKE] Testing PR #6322 (build performance update) #6562

iknox-fa opened this issue Jan 10, 2023 · 4 comments

Comments

@iknox-fa
Copy link
Contributor

iknox-fa commented Jan 10, 2023

This ticket encompasses the work necessary to review PR #6322 (related to #6073) and determine the following:

  • Does this actually improve performance? The contributor claims some level of performance improvement, but we need proper benchmarks and the core team should be responsible for building said benchmarks-- especially since initial testing on smaller projects and synthetic benchmarking projects doesn't seem to show much of an improvement.
  • Is this algorithm the right choice to use across the board? Does it have performance characteristics that fit everyone's needs?

Once this spike determines this work is suitable for what we need here we can work with the contributor to correct some code issues and add some testing. If it doesn't we should probably prioritize putting dbt engineering hours into this since it's obviously been an issue for some time and judging by slack conversations, there's a real paint point to be solved here.

This ticket may should also drive us to get better performance monitoring setup around the build command.

Slack refs:

@github-actions github-actions bot changed the title [SPIKE] Testing [CT-1781] [SPIKE] Testing Jan 10, 2023
@iknox-fa iknox-fa changed the title [CT-1781] [SPIKE] Testing [SPIKE] Testing PR #6322 (build performance update) Jan 10, 2023
@jtcohen6
Copy link
Contributor

Prior art for performance benchmarking + regression testing: https://github.com/dbt-labs/dbt-core/tree/main/performance

With partial parsing enabled, I believe we could use this top-level command to reliably isolate the time associated with compiling a DAG with "extra test edges": dbt build --exclude fqn:*

@nathaniel-may
Copy link
Contributor

Scope of this ticket is just to decide if want to merge this PR or not. Additional foundational performance improvements are out of scope for now.

@boxysean
Copy link
Contributor

Using dbt's built-in timing info (thanks @dbeatty10!), I ran dbt --no-partial-parse compile on a gnarly DAG sized 8468 models, 17103 tests. It took 14m4s to run on dbt-core#tag/v1.3.2 code, and 16m3s to run on dbt-core#tag/v1.3.2 with #6322 patch applied to dbt/graph/graph.py.

Looking into the charts below, we see that indeed the changed code in #6322 seems to have added time.

snakeviz

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Feb 16, 2024
@github-actions github-actions bot removed the stale Issues that have gone stale label Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants