Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BI-2308 - Field Book BrAPI Integration - Importing Fields #402

Merged
merged 13 commits into from
Oct 4, 2024

Conversation

nickpalladino
Copy link
Member

@nickpalladino nickpalladino commented Sep 18, 2024

Description

Story: BI-2308

  • Branch name doesn't match but it's for BI-2308
  • modified removeProgramKeyAndUnknownAdditionalData as regex seemed incorrect and was removing until the second ] and resulting pedigree string was incorrect. The method is used a bunch if different places so should make sure there aren't regressions. Not sure if issues have been noticed in the past?
  • Ended up not removing setDbIds methods which was discussed as a possible approach in IT planning. The scope of doing that was much larger because the BrAPI objects in the cache are keyed off the DeltaBreed UUID, not the brapi dbId.

Dependencies

  • Field Book v5.6.24

Testing

GIVEN have selected a field on the FieldBook Fields page to import
WHEN field is import is started
THEN field should successfully load in a time proportional to the amount of data, not hang indefinitely, and not get an error message

GIVEN have imported a Field with germplasm containing synonyms and pedigree
WHEN the attributes are viewed on the collect page
THEN should see synonyms and pedigree string with program keys & accession numbers stripped out

Checklist:

  • I have performed a self-review of my own code
  • I have tested my code and ensured it meets the acceptance criteria of the story
  • I have tested that my code works with both the brapi-java-server and BreedBase
  • I have create/modified unit tests to cover this change
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to documentation
  • I have run TAF: <please include a link to TAF run>

@github-actions github-actions bot added the bug Something isn't working label Sep 18, 2024
@nickpalladino nickpalladino marked this pull request as ready for review September 19, 2024 15:29
@nickpalladino nickpalladino requested review from a team, davedrp, dmeidlin and mlm483 and removed request for a team September 19, 2024 15:55
// convert dbIds to DeltaBreed UUID
BrAPIGermplasmListResponse response = brapiGermplasm.getBody().getLeft().get();
List<BrAPIGermplasm> germplasm = response.getResult().getData();
germplasm.forEach(g -> setDbIdsAndStripProgramKeys(g, programKey));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
germplasm.forEach(g -> setDbIdsAndStripProgramKeys(g, programKey));
batchProcessGermplasm(germplasm, programKey);

Comment on lines +165 to +177
/**
* \s*: Matches zero or more whitespace characters before the opening bracket.
* \[: Matches the opening square bracket [. The backslash is used to escape the special meaning of [ in regex.
* .*?: Matches any character (except newline) zero or more times, non-greedily.
* . matches any character except newline.
* * means "zero or more times".
* ? makes the matching non-greedy, so it stops at the first closing bracket.
* \]: Matches the closing square bracket ]. Again, the backslash is used to escape it.
* \s*: Matches zero or more whitespace characters after the closing bracket.
* @param original
* @param programKey
* @return
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests for this method covering these cases with specific assertions for the expected format of synonyms and pedigree strings aster stripping would be helpful when testing for regressions later. New edge cases found could be added along with tests for very large fields or complex pedigrees.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created card BI-2325

Copy link
Contributor

@dmeidlin dmeidlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setDbIdsAndStripProgramKeys method in BrAPIGermplasmController.java processes each germplasm item individually. For large datasets, this could potentially impact performance. I've added a method for processing a germplasm batch that only compiles the regex once and uses parallel streams if possible.

For clarity, I extracted the logic for replacing delta ids with dbIds into a private method to keep the body of the controller method lean.

I also think unit tests for removeProgramKeyAndUnknownAdditionalData are a good idea, but probably exceeds the effort level estimate for this card, but adding it to the card that checks for regressions in other places this method is used should include writing the tests.

dmeidlin and others added 2 commits September 20, 2024 14:58
Copy link
Contributor

@mlm483 mlm483 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested locally with Field Book version 5.6.24.

The first test passed, I was able to load fields at around 1000 germplasm/minute which seems fast enough for now.

The second test is less clear to me, I'm not sure where I'm meant to see synonym or pedigree info, but I didn't see any. The germplasm names in the collect screen didn't have the program key which is good.

Also, using wireshark, I see there was one request made from bi-api to brapi-server to the /brapi/v2/search/germplasm POST endpoint with an external reference and a list of germplasmDbIds. A get is then made for the search results (/brapi/v2/search/germplasm/{searchResultDbId}) but what's odd is the results are not ready when this request is made, and the client (bi-api) never seems to make a subsequent request, so I'm not sure what's happening there.

Copy link
Contributor

@dmeidlin dmeidlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just re-submitting initial request

@nickpalladino
Copy link
Member Author

I tested locally with Field Book version 5.6.24.

The first test passed, I was able to load fields at around 1000 germplasm/minute which seems fast enough for now.

The second test is less clear to me, I'm not sure where I'm meant to see synonym or pedigree info, but I didn't see any. The germplasm names in the collect screen didn't have the program key which is good.

If you included synonyms and pedigree information in the germplasm you should be able to tap the attributes on the collect page to open a dialog to increase the number of attributes shown and select the attributes you would like displayed. Synonyms and pedigree should be selectable options in the list.

Also, using wireshark, I see there was one request made from bi-api to brapi-server to the /brapi/v2/search/germplasm POST endpoint with an external reference and a list of germplasmDbIds. A get is then made for the search results (/brapi/v2/search/germplasm/{searchResultDbId}) but what's odd is the results are not ready when this request is made, and the client (bi-api) never seems to make a subsequent request, so I'm not sure what's happening there.

From what I was seeing the BrAPI server will give an immediate response if the number of dbIds being searched is below a certain threshold then switch to the searchResultDbId after that point. Field Book should make requests until it gets the result data.

@mlm483
Copy link
Contributor

mlm483 commented Oct 1, 2024

With a small amount of data, I was able to see synonyms and pedigree. Using wireshark, I could see the response to the POST to the /brapi/v2/search/germplasm endpoint contained a searchResultDbId, and the data was returned in the response to the first GET request to the /brapi/v2/search/germplasm/{searchResultDbId} endpoint.

Screenshot 2024-10-01 at 9 44 35 AM

With a moderate amount of data (5000 Germplasm, ~1000 referenced by Observation Units), Field Book times out after 2 minutes when previewing the field (120 seconds is the default value for BRAPI_TIMEOUT in Field Book). The call that it times out on is the germplasm search request. Looking at the bi-api logs, I can see it's hitting the cache once for each Germplasm record, I think it's important to make that more efficient in this story.

Screenshot 2024-10-01 at 10 14 22 AM

Copy link
Contributor

@mlm483 mlm483 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the Field Book bug (I see your issue), this is working as expected.

@nickpalladino nickpalladino merged commit e21aaa3 into develop Oct 4, 2024
1 check failed
@nickpalladino nickpalladino deleted the bug/BI-2309 branch October 4, 2024 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants