[CT-1927] [Feature] Consider only successors of the deleted nodes when populating the GraphQueue #6759

dbeatty10 · 2023-01-26T18:53:28Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

The current implementation here looks through all four of these nodes, no matter what.

We can do better! In fact, we only need to add nodes with an in-degree of 0. This means we only need to visit nodes that could possibly have an in-degree of zero. This only happens to a node whose predecessor is removed, so we can restrict visits to successor nodes of removed nodes.

Example

We can make it so that if Node 2 is removed, then only its successors (Nodes 3 and 4) will be visited, and only Node 3 will change to an in-degree of zero. Only nodes with an in-degree of zero need to be re-added to the queue -- in this case, only Node 3.

Average case

We can see that before it would visit all X nodes in the average case. But now the average case is just the typical number of successors (i.e., the mean number of out-degrees per node).

I would guess for most dbt projects, the mean number of out-degrees will be WAAAAY closer to 0 than it will be to X, hence "taking almost no time at all" 😉

Describe alternatives you've considered

The status quo is an alternative, but not a very competitive one 😄

There might be some other optimizations we can pursue, but this is a good place to start.

Who will this benefit?

This should help somewhat with large dbt projects with 1000+ nodes like described in #6073

Are you interested in contributing this feature?

See #6720

Anything else?

No response

dbeatty10 added enhancement New feature or request triage labels Jan 26, 2023

github-actions bot changed the title ~~[Feature] Consider only successors of the deleted nodes when populating the GraphQueue~~ [CT-1927] [Feature] Consider only successors of the deleted nodes when populating the GraphQueue Jan 26, 2023

dbeatty10 mentioned this issue Jan 26, 2023

Graph Analysis Optimization for Large Dags #6720

Merged

6 tasks

peterallenwebb closed this as completed in #6720 Jan 26, 2023

peterallenwebb removed the triage label Jan 26, 2023

dbeatty10 added the performance label Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1927] [Feature] Consider only successors of the deleted nodes when populating the GraphQueue #6759

[CT-1927] [Feature] Consider only successors of the deleted nodes when populating the GraphQueue #6759

dbeatty10 commented Jan 26, 2023

[CT-1927] [Feature] Consider only successors of the deleted nodes when populating the GraphQueue #6759

[CT-1927] [Feature] Consider only successors of the deleted nodes when populating the GraphQueue #6759

Comments

dbeatty10 commented Jan 26, 2023

Is this your first time submitting a feature request?

Describe the feature

Example

Average case

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?