Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cheeky performance improvement on big DAGs #6694

Merged
merged 4 commits into from
Jan 23, 2023

Conversation

boxysean
Copy link
Contributor

@boxysean boxysean commented Jan 22, 2023

resolves #6697

Description

Small optimization on manifest parsing benefitting large DAGs.

Swapping the conditions of the if statement as shown in 9f96087 short-circuits the relatively expensive _already_known method call. In a test DAG provided by my client, with 8468 models and 17103 tests, this significantly reduced calls to _already_known and reduced runtime in my test environment. See profiling screenshot of a dbt compile command:

Screenshot 2023-01-22 at 9 12 15 PM

I don't know precisely what this will translate to in real-world results, but it's safe to say (1) there are fewer calls to a relatively expensive method, and (2) the logic is the same. My client reports any dbt build command takes 15 minutes startup time -- I hope afterwards the startup time will be between 11-12 minutes.

Checklist

@cla-bot cla-bot bot added the cla:yes label Jan 22, 2023
@boxysean boxysean force-pushed the cheeky_perf_improvement_big_dags branch from 8d24180 to 9f96087 Compare January 22, 2023 20:49
@boxysean boxysean marked this pull request as ready for review January 22, 2023 20:54
@boxysean boxysean requested a review from a team January 22, 2023 20:54
@boxysean boxysean requested a review from a team as a code owner January 22, 2023 20:54
@boxysean boxysean requested review from aranke and Fleid January 22, 2023 20:54
Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very smart @boxysean ! 🧠 Commutativity for the win! 🏆

One small suggestion:

  • As an alternative to #6073, I created a stand-alone issue for this precision change that we could link back to instead
  • We can treat #6073 more like #5527

Open to discussion if this seems unnecessary or undesirable.

.changes/unreleased/Under the Hood-20230122-215235.yaml Outdated Show resolved Hide resolved
Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com>
@boxysean
Copy link
Contributor Author

Perfect, plan looks good @dbeatty10! Thanks, let me know if you need anything else from me.

Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's goooo!

I will merge once it finishes its additional round of CI checks.

@dbeatty10 dbeatty10 merged commit 5c765bf into dbt-labs:main Jan 23, 2023
@jtcohen6 jtcohen6 mentioned this pull request Feb 7, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1881] [Feature] When possible, bypass an expensive method during graph construction
2 participants