Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1927] [Feature] Consider only successors of the deleted nodes when populating the GraphQueue #6759

Closed
3 tasks done
dbeatty10 opened this issue Jan 26, 2023 · 0 comments · Fixed by #6720
Closed
3 tasks done
Labels
enhancement New feature or request performance

Comments

@dbeatty10
Copy link
Contributor

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

The current implementation here looks through all four of these nodes, no matter what.

image

We can do better! In fact, we only need to add nodes with an in-degree of 0. This means we only need to visit nodes that could possibly have an in-degree of zero. This only happens to a node whose predecessor is removed, so we can restrict visits to successor nodes of removed nodes.

Example

We can make it so that if Node 2 is removed, then only its successors (Nodes 3 and 4) will be visited, and only Node 3 will change to an in-degree of zero. Only nodes with an in-degree of zero need to be re-added to the queue -- in this case, only Node 3.

Average case

We can see that before it would visit all X nodes in the average case. But now the average case is just the typical number of successors (i.e., the mean number of out-degrees per node).

I would guess for most dbt projects, the mean number of out-degrees will be WAAAAY closer to 0 than it will be to X, hence "taking almost no time at all" 😉

Describe alternatives you've considered

The status quo is an alternative, but not a very competitive one 😄

There might be some other optimizations we can pursue, but this is a good place to start.

Who will this benefit?

This should help somewhat with large dbt projects with 1000+ nodes like described in #6073

Are you interested in contributing this feature?

See #6720

Anything else?

No response

@dbeatty10 dbeatty10 added enhancement New feature or request triage labels Jan 26, 2023
@github-actions github-actions bot changed the title [Feature] Consider only successors of the deleted nodes when populating the GraphQueue [CT-1927] [Feature] Consider only successors of the deleted nodes when populating the GraphQueue Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants