Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion): extend feast plugin to ingest tags and owners #11784

Merged

Conversation

margaridafernandes-trip
Copy link
Contributor

@margaridafernandes-trip margaridafernandes-trip commented Nov 4, 2024

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Nov 4, 2024
@margaridafernandes-trip margaridafernandes-trip changed the title feat(ingestion): extend feast plugin to ingest tags for features feat(ingestion): extend feast plugin to ingest tags and owners Nov 8, 2024

for mapping in self.source_config.owner_mappings:
if mapping["feast_owner_name"] == owner:
ownership_type_class: OwnershipTypeClass = mapping["ownership_type"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: let's keep the naming consistent here:

datahub_owner_urn
datahub_ownership_type

Also, please add comments to the above Dict field to indicate which keys are required, and which are optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added type hints to reflect that they are optional and in the case they are not passed they will have default values.

ownership_type_class: OwnershipTypeClass = mapping["ownership_type"]
return OwnerClass(
owner=mapping["datahub_owner_urn"],
type=ownership_type_class,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a user does not provide an ownership type?

Maybe we have some default here (TECHNICAL owner or something)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks

def _create_owner_association(self, owner: str) -> OwnerClass:

for mapping in self.source_config.owner_mappings:
if mapping["feast_owner_name"] == owner:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add to the comment for the owner mappings argument above that if the user does NOT provide feast_owner_name, no owners will be extracted.

This part is important - many folks will expect that we just plain map the feast owner name into DataHub URN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the comment and also set a default value for datahub_owner_urn.

@@ -91,6 +96,9 @@ class FeastRepositorySourceConfig(ConfigModel):
environment: str = Field(
default=DEFAULT_ENV, description="Environment to use when constructing URNs"
)
owner_mappings: List[Dict[str, str]] = Field(
default={}, description="Mapping of owner names to owner types"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned below, we should have a better description of this argument with mentions of which keys are required, what happens if keys are not specified, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the comment and also added owner_mappings as Optional

@@ -91,6 +96,9 @@ class FeastRepositorySourceConfig(ConfigModel):
environment: str = Field(
default=DEFAULT_ENV, description="Environment to use when constructing URNs"
)
owner_mappings: List[Dict[str, str]] = Field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add 2 additional flags:

  • enable_owner_extraction: BOOL - If this is disabled, then we NEVER try to map owners. If this is enabled, then owner_mappings is REQUIRED to extract ownership. (We can use something like this for the docs)
  • enable_tag_extraction: BOOL - If this is disabled, then we NEVER try to extract tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is already forcing this behavior, I also left comments to reinforce this.

Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few final comments. I think we are pretty close

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Nov 25, 2024
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Nov 25, 2024
Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thank you!

@datahub-cyborg datahub-cyborg bot added merge-pending-ci A PR that has passed review and should be merged once CI is green. and removed needs-review Label for PRs that need review from a maintainer. labels Nov 27, 2024
@jjoyce0510 jjoyce0510 merged commit f3eda31 into datahub-project:master Nov 27, 2024
74 checks passed
sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants