Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(graphql): adds container aspect for dataflow and datajob entities #12236

Merged

Conversation

sgomezvillamor
Copy link
Contributor

@sgomezvillamor sgomezvillamor commented Dec 27, 2024

This update extends the model by adding Container aspects to the DataFlow and DataJob entities. Additionally, GraphQL resolvers are implemented, including a new parentContainers resolver.

TODO

  • What about parentContainers?
  • Do I need to make anything related to search? No, unless I missed something 😅
  • UI

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Copy link
Collaborator

@anshbansal anshbansal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add in https://github.com/datahub-project/datahub/blob/master/metadata-service/servlet/src/main/java/com/datahub/gms/servlet/Config.java#L30-L39 this map so CLI can check the existence of functionality before trying to emit containers and causing exceptions in ingestion

@sgomezvillamor
Copy link
Contributor Author

Let's also add in https://github.com/datahub-project/datahub/blob/master/metadata-service/servlet/src/main/java/com/datahub/gms/servlet/Config.java#L30-L39 this map so CLI can check the existence of functionality before trying to emit containers and causing exceptions in ingestion

  Map<String, Object> config =
      new HashMap<String, Object>() {
        {
          put("noCode", "true");
          put("retention", "true");
          put("statefulIngestionCapable", true);
          put("patchCapable", true);
          put("timeZone", ZoneId.systemDefault().toString());
        }
      };

The map does seem high-level. While adding entries like put("supportForContainerInDataFlow", true); would allow clients to emit containers if supported in backend, it might become hard to manage over time. Perhaps introducing versioning for the model is a better approach. Has this been considered before?

@sgomezvillamor sgomezvillamor marked this pull request as ready for review January 2, 2025 11:02
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Jan 2, 2025
@sgomezvillamor
Copy link
Contributor Author

sgomezvillamor commented Jan 2, 2025

I tested with golden file from a PR which was reverted because backend missing the capability being implemented in current PR

https://github.com/datahub-project/datahub/pull/12180/files#diff-ef8bf2ae9f199997fa69bbb2f0943c0069156a8a10d97d63c7d333992810e2f2

Ingestion worked successfully with no errors and this is how it looks like in the UI

Screenshot 2025-01-02 at 12 51 14 Screenshot 2025-01-02 at 12 51 32

Copy link
Collaborator

@chriscollins3456 chriscollins3456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there's actually a few more pieces here for this to be fully wired up. tbh i'm not positive how you're seeing a container in the UI at all...

  1. Add CONTAINER_ASPECT_NAME to the list of ASPECTS_TO_RESOLVE in both DataJobType.java and DataFlowType.java - this will tell us to fetch these aspects when hydrating this entity
  2. update entity.graphql for both the DataJob and DataFlow entity types to have a container field as well as parentContainers. you can see how this works for type Chart or dataset or whatever and essentially copy paste. This is our graphql schema and enables the frontend to start fetching these things
  3. update the .graphql files on the frontend for datajob and dataflow to fetch container/parentContainers to display them in the UI.

let me know if you have any questions and we can hop on a call and chat about it! i'm really curious how you were able to see containers for these entities without the above 3 steps..

Comment on lines 2386 to 2387
return dataJob.getDataPlatformInstance() != null
? dataJob.getDataPlatformInstance().getUrn()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a copy paste error here - we want to have dataJob.getContainer here not getDataPlatformInstance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! fixed in commit 21e00bc

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jan 8, 2025
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jan 9, 2025
…ude parentContainers in graphql frontend schemas
@sgomezvillamor
Copy link
Contributor Author

sgomezvillamor commented Jan 9, 2025

so there's actually a few more pieces here for this to be fully wired up.

Yes, I was expecting to miss some pieces. Thanks for the review!

tbh i'm not positive how you're seeing a container in the UI at all...

I made the changes listed below (except for point 2) and tested them locally. The UI still looks the same as the screenshots above. Could you share what specific concerns you have?

Clearly, even though I've updated the dataflow and datajob GraphQL schemas in the frontend project, I haven't modified any UI components. So unless they are being used automagically, these updates aren't in use yet.

  1. Add CONTAINER_ASPECT_NAME to the list of ASPECTS_TO_RESOLVE in both DataJobType.java and DataFlowType.java - this will tell us to fetch these aspects when hydrating this entity

Done in commit 49b7259

  1. update entity.graphql for both the DataJob and DataFlow entity types to have a container field as well as parentContainers. you can see how this works for type Chart or dataset or whatever and essentially copy paste. This is our graphql schema and enables the frontend to start fetching these things

My PR already updated datahub-graphql-core/src/main/resources/entity.graphql
You can check in https://github.com/datahub-project/datahub/pull/12236/files#diff-df9c96427d45d7af6d92dd6caa23a349357dbc4bdb915768ab4ce000a4286964
Is there any other entity.graphql I missed?

  1. update the .graphql files on the frontend for datajob and dataflow to fetch container/parentContainers to display them in the UI.

Checking other entities such as dataset or chart, I've noted only parentContainers is added, so I did the same for dataflow and datajob.

Done in commit 49b7259

let me know if you have any questions and we can hop on a call and chat about it! i'm really curious how you were able to see containers for these entities without the above 3 steps..

Copy link
Collaborator

@chriscollins3456 chriscollins3456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! sorry i think i just missed the original change on entity.graphql in my first review. but this is looking solid now

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Jan 9, 2025
Copy link

codecov bot commented Jan 9, 2025

Codecov Report

Attention: Patch coverage is 51.35135% with 18 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...com/linkedin/datahub/graphql/GmsGraphQLEngine.java 0.00% 12 Missing ⚠️
...b-react/src/app/entity/dataJob/preview/Preview.tsx 25.00% 3 Missing ⚠️
...b/graphql/types/datajob/mappers/DataJobMapper.java 85.71% 0 Missing and 1 partial ⚠️
...b-react/src/app/entity/dataFlow/DataFlowEntity.tsx 0.00% 1 Missing ⚠️
...web-react/src/app/entity/dataJob/DataJobEntity.tsx 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
...n/datahub/graphql/types/dataflow/DataFlowType.java 15.06% <ø> (ø)
...graphql/types/dataflow/mappers/DataFlowMapper.java 34.34% <100.00%> (ø)
...din/datahub/graphql/types/datajob/DataJobType.java 0.00% <ø> (ø)
...-react/src/app/entity/dataFlow/preview/Preview.tsx 98.00% <100.00%> (+0.08%) ⬆️
...b/graphql/types/datajob/mappers/DataJobMapper.java 17.18% <85.71%> (ø)
...b-react/src/app/entity/dataFlow/DataFlowEntity.tsx 53.95% <0.00%> (-0.26%) ⬇️
...web-react/src/app/entity/dataJob/DataJobEntity.tsx 43.23% <0.00%> (-0.17%) ⬇️
...b-react/src/app/entity/dataJob/preview/Preview.tsx 26.16% <25.00%> (-0.05%) ⬇️
...com/linkedin/datahub/graphql/GmsGraphQLEngine.java 0.00% <0.00%> (ø)

... and 598 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8e7cb2...f20582b. Read the comment docs.

@sgomezvillamor sgomezvillamor merged commit efc5d31 into master Jan 10, 2025
202 of 207 checks passed
@sgomezvillamor sgomezvillamor deleted the feature/cus-3452-graphql-dataflowjob-container-aspect branch January 10, 2025 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Issues and Improvements to docs pending-submitter-merge product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants