Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CT-2414: Add graph summaries to target directory output #7358

Merged
merged 8 commits into from
Apr 28, 2023

Conversation

peterallenwebb
Copy link
Contributor

@peterallenwebb peterallenwebb commented Apr 13, 2023

resolves #7357

Description

Adds a new graph summary output file named graph_summary.json to the target directory output.

Checklist

@peterallenwebb peterallenwebb requested review from a team and aranke April 13, 2023 21:21
@cla-bot cla-bot bot added the cla:yes label Apr 13, 2023
@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@dbeatty10
Copy link
Contributor

To keep the number of artifacts to a minimum, could we just write out graph_with_test_edges.json?

From there, consumers could filter out any resource types they don't want.

I know we started to discuss that idea here, but now that you have this pull request, it's probably useful to move that conversation to here.

@peterallenwebb peterallenwebb requested a review from a team as a code owner April 25, 2023 18:26
@dbeatty10
Copy link
Contributor

@peterallenwebb Would you be open to just a single JSON artifact that is a superset of both artifacts?

That way, there's just one artifact to produce/consumer, and the end user can filter out any nodes they want to ignore.

Some benefits is that it would make the docs and implementation more simple.

@peterallenwebb
Copy link
Contributor Author

@dbeatty10 I am just getting back to working on this, and not disregarding your earlier request, I promise. Using a single file will complicate the implementation, especially since the tests are added on one code path but not another, so I'm trying to figure out how to keep both the implementation and the output simple.

I think I am going to spin the snowplow stats off as a separate case, though.

@dbeatty10
Copy link
Contributor

No worries @peterallenwebb !

Spinning off the snowplow stats seems like nice effort that is complementary.

I was mainly wondering what would be lost if you just dropped this line:

linker.write_graph_summary("linked", out_stream, manifest)

Is it because graph_with_test_edges.json would not be a simple superset of graph.json?

@peterallenwebb
Copy link
Contributor Author

@dbeatty10 There is some debate about whether the algorithm for adding the test edges is completely correct, and we know it has performance issues that we'd like to address, so my idea here is to make sure we know exactly what the graph looks like before and after that algorithm runs. That way we'll have a known starting point for refining and/or optimizing that algorithm.

I'm going to take one more pass at making this simpler.

Copy link
Contributor

@iknox-fa iknox-fa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM-- I bet we'll want to make some tweaks at some point (esp wrt test edges), but this seems like a very sane starting place.

Copy link
Member

@aranke aranke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write a test here?

@peterallenwebb peterallenwebb requested a review from a team as a code owner April 28, 2023 19:23
@peterallenwebb peterallenwebb requested review from emmyoop and removed request for a team April 28, 2023 19:23
@peterallenwebb
Copy link
Contributor Author

Added a test as suggested and included the invocation_id in the summary so that it could more easily be correlated with logging events and other artifacts of the same dbt invocation.

@peterallenwebb peterallenwebb merged commit c56a9b2 into main Apr 28, 2023
@peterallenwebb peterallenwebb deleted the paw/CT-2414-graph-summary branch April 28, 2023 20:00
@peterallenwebb
Copy link
Contributor Author

@boxysean You may find this interesting.

@aranke aranke mentioned this pull request May 9, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-2414] Output Graph Summary Information on When dbt Compiles
4 participants