You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
The current implementation here looks through all four of these nodes, no matter what.
We can do better! In fact, we only need to add nodes with an in-degree of 0. This means we only need to visit nodes that could possibly have an in-degree of zero. This only happens to a node whose predecessor is removed, so we can restrict visits to successor nodes of removed nodes.
Example
We can make it so that if Node 2 is removed, then only its successors (Nodes 3 and 4) will be visited, and only Node 3 will change to an in-degree of zero. Only nodes with an in-degree of zero need to be re-added to the queue -- in this case, only Node 3.
Average case
We can see that before it would visit all X nodes in the average case. But now the average case is just the typical number of successors (i.e., the mean number of out-degrees per node).
I would guess for most dbt projects, the mean number of out-degrees will be WAAAAY closer to 0 than it will be to X, hence "taking almost no time at all" 😉
Describe alternatives you've considered
The status quo is an alternative, but not a very competitive one 😄
There might be some other optimizations we can pursue, but this is a good place to start.
Who will this benefit?
This should help somewhat with large dbt projects with 1000+ nodes like described in #6073
github-actionsbot
changed the title
[Feature] Consider only successors of the deleted nodes when populating the GraphQueue
[CT-1927] [Feature] Consider only successors of the deleted nodes when populating the GraphQueue
Jan 26, 2023
Is this your first time submitting a feature request?
Describe the feature
The current implementation here looks through all four of these nodes, no matter what.
We can do better! In fact, we only need to add nodes with an in-degree of 0. This means we only need to visit nodes that could possibly have an in-degree of zero. This only happens to a node whose predecessor is removed, so we can restrict visits to successor nodes of removed nodes.
Example
We can make it so that if Node 2 is removed, then only its successors (Nodes 3 and 4) will be visited, and only Node 3 will change to an in-degree of zero. Only nodes with an in-degree of zero need to be re-added to the queue -- in this case, only Node 3.
Average case
We can see that before it would visit all X nodes in the average case. But now the average case is just the typical number of successors (i.e., the mean number of out-degrees per node).
I would guess for most dbt projects, the mean number of out-degrees will be WAAAAY closer to 0 than it will be to X, hence "taking almost no time at all" 😉
Describe alternatives you've considered
The status quo is an alternative, but not a very competitive one 😄
There might be some other optimizations we can pursue, but this is a good place to start.
Who will this benefit?
This should help somewhat with large dbt projects with 1000+ nodes like described in #6073
Are you interested in contributing this feature?
See #6720
Anything else?
No response
The text was updated successfully, but these errors were encountered: