-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximally parallelize dbt clone #10129
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10129 +/- ##
==========================================
+ Coverage 88.19% 88.68% +0.49%
==========================================
Files 181 180 -1
Lines 22786 22446 -340
==========================================
- Hits 20096 19907 -189
+ Misses 2690 2539 -151
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I've done some local 🎩 benchmarking of 🎩 Setup:
On a project with a chain of 5 models (i.e. 1 -> 2 -> 3 -> 4 -> 5):Recording average execution time of
On a project with a chain of 10 models:Recording average execution time of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice that this was fairly simple. Would like just an additional comment as noted on the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late to the party and sorry about being nitpicking here.
Let's discuss a bit as this is probably going to be used as example for next set of tests
self.forced_exception_class = exception_class | ||
self.did_cancel: bool = False | ||
super().__init__(args=MockArgs(), config=MockConfig(), manifest=None) | ||
self.manifest = make_manifest(nodes=nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like we are reimplementing the task logic in the test again. I view unit test as documenting behavior in this case, so the only thing we need to test at the runnable task is: are we using the correct arguments used when calling get_graph_queue
?
@@ -40,13 +63,25 @@ def _cancel_connections(self, pool): | |||
|
|||
def get_node_selector(self): | |||
"""This is an `abstract_method` on `GraphRunnableTask`, thus we must implement it""" | |||
return None | |||
selector = ResourceTypeSelector( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
||
def defer_to_manifest(self, adapter, selected_uids: AbstractSet[str]): | ||
"""This is an `abstract_method` on `GraphRunnableTask`, thus we must implement it""" | ||
return None | ||
|
||
|
||
class MockRunnableTaskIndependent(MockRunnableTask): | ||
def get_run_mode(self) -> GraphRunnableMode: | ||
return GraphRunnableMode.Independent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are using inheritance where we should be using patch
@@ -387,3 +388,17 @@ def replace_config(n, **kwargs): | |||
config=n.config.replace(**kwargs), | |||
unrendered_config=dict_replace(n.unrendered_config, **kwargs), | |||
) | |||
|
|||
|
|||
def make_manifest(nodes=[], sources=[], macros=[], docs=[]) -> Manifest: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is duplicating
dbt-core/tests/unit/utils/manifest.py
Line 986 in 84456f5
def manifest( |
resolves #7914
Problem
For GraphRunnableTasks such as the CloneTask, the run queue always enforced topological order of dependencies during execution, even when the task does not strictly require it.
Solution
Make it possible for GraphRunnableTask subclasses to provide a
PRESERVE_EDGES
class attribute that directs the graph queue generation to remove edges from the graph prior to constructing the priority queue. The reason a class attribute was chosen was because this does not need to be user-configurable and should be a constant queue mechanism for all invocations of the clone command.Checklist