Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FDS-1725] Missing entityId handling testing #1496

Merged
merged 38 commits into from
Sep 19, 2024

Conversation

BWMac
Copy link
Collaborator

@BWMac BWMac commented Sep 6, 2024

Description:
This PR follows up on #1456, where the filenameExists validation rule was implemented. This rule handles cases where a file path present in a manifest does not exist in a dataset and cases where the entityId provided in a manifest row does not match with its corresponding file path.

This PR adds additional error handling for situations including:

  • The entityId field in a manifest row is empty
  • The entityId provided in a manifest row does not exist in the dataset

Previously, an outer join was used in the validation rule followed by a line that handled cases where there were more rows in the dataset than the manifest. The outer join implementation did not always maintain the order of manifest rows resulting in incorrect errors on manifest rows. I changed this outer join to a left join on the manifest to handle both issues.

The row number indexing for filename_validation is also updated to match the other validation rules for consistency.

I have also updated generate_filename_error to handle the new error cases.

Testing:

  • New integration test cases for filename_validation have been added as needed.
  • Unit tests for filename_validation and generate_filename_error have been added

@BWMac BWMac marked this pull request as ready for review September 6, 2024 19:34
Copy link
Collaborator

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@GiaJordan
Copy link
Contributor

For the new cases where we're checking if entityIds don't exist at all or are missing from the manifest we should also add validation error messages to generate_filename_error that specify to the user what the issue with their metadata was. And then the validation integration test should be updated to check for those error messages for the test cases you add to the manifest

@BWMac
Copy link
Collaborator Author

BWMac commented Sep 6, 2024

Thanks for the comments @GiaJordan! I am going to revert this back to being a draft for now and work on those updates.

@BWMac BWMac marked this pull request as draft September 6, 2024 22:02
@BWMac BWMac marked this pull request as ready for review September 16, 2024 17:09
Copy link
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking very good! I left a few comments. But I have no problems having it pre-approved

@linglp
Copy link
Contributor

linglp commented Sep 18, 2024

@BWMac sorry one more comment. Since there's no test related to CLI, I could test manually. Give me some time :) I will report back shortly.

@linglp
Copy link
Contributor

linglp commented Sep 18, 2024

@BWMac thanks for your patience. I tested the validation CLI with my own test manifest and it works. And I also realized that we don't have "dataset scope" parameter added for manifest submission so we can't use the file name validation during submission. But this issue is definitely outside of your PR! Thanks for your hard work. After the tests are all passing, I think you could merge 👍🏼

Copy link

@BWMac BWMac dismissed GiaJordan’s stale review September 19, 2024 20:24

Approved by other code owners

@BWMac BWMac merged commit 05836c9 into develop Sep 19, 2024
7 checks passed
@BWMac BWMac deleted the bwmac/FDS-1725/missing_entityID_testing branch September 19, 2024 20:25
@BWMac BWMac mentioned this pull request Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants