Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timx 291 orchestration #205

Merged
merged 15 commits into from
Aug 9, 2024
Merged

Timx 291 orchestration #205

merged 15 commits into from
Aug 9, 2024

Conversation

ehanson8
Copy link
Contributor

@ehanson8 ehanson8 commented Aug 1, 2024

Purpose and background context

Update the application's orchestration to streamline the code and improve maintainability

How can a reviewer manually see the effects of these changes?

Run a JSON and an XML transform to ensure that records are still transformed successfully after the orchestration refactor:

pipenv run transform -o output/gismit.json -i  tests/fixtures/aardvark/aardvark_record_all_fields.jsonl -s gismit
pipenv run transform -i tests/fixtures/dspace/dspace_dim_records.xml -o output/dspace-transformed-records.json -s whoas

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

* Update get_valid_title, get_source_link, and get_timdex_record_id to instance methods in Transformer, JSONTransformer, and XMLTransformer classes
* Update get_source_link and get_timdex_record_id to instance methods in Aardvark and SpringshareOaiDC classes
* Update corresponding unit tests
Why these changes are being introduced:
* Refactoring this method fundamentally simplifies the application and allows for the removal of several methods

How this addresses that need:
* Refactor _transform method
* Add generate_derived_fields and get_optional_field_methods methods
* Add get_required_field_names and get_optional_field_names methods to TimdexRecord class

Side effects of this change:
* All transform classes with need to be updated in subsequent commits

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-291
* Remove get_required_fields method from Transformer, JSONTransformer, and XMLTransformer classes
* Remove unit tests for this method
* Some field methods omitted the source_record param because they delivered static values. This caused issues with the new orchestration so an unused param was added.
* Refactor create_dates_and_locations_from_publishers and create_locations_from_spatial_subjects method to use TimdexRecord instances
* Call methods in generate_derived_fields
* Update corresponding unit tests
* Refactor generate_citation to use a TimdexRecord instance
* Shift call into generate_derived_fields method
* Update corresponding unit tests
* Refactor MITAardvark.get_rights to instance method in order to access source attribute
* Update corresponding unit tests
* Remove get_optional_fields method from all classes
* Remove unit tests for this method
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments!

transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/xml/dspace_dim.py Show resolved Hide resolved
transmogrifier/sources/xml/datacite.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
* Refactor derived field methods to return iterators
* Split create_dates_and_locations_from_publishers method based on object type
* Refactor generate_derived_fields method to accommodate whether attributes exist or not
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for organizing these comments as a "review". Submitting a review with comments for discussion here.

transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/xml/dspace_dim.py Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
tests/sources/test_transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
* Remove if not record block from __next__ method as it relates to an outdated method of flow control
*Rename _transform to transform and remove transform methods from JSONTransformer and XMLTransformer
* Refactor generate_derived_fields and get_optional_field_methods
* Remove try/except block from transform method
@ehanson8 ehanson8 marked this pull request as ready for review August 5, 2024 19:33
@ehanson8
Copy link
Contributor Author

ehanson8 commented Aug 5, 2024

@ghukill @jonavellecuerdo I'm making this PR official, thank you both so very much for all of the discussion!

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this looks superb! Nice work @ehanson8. Thanks for taking a complex topic, faciliating discussions around changes, and materializing all that in this PR. While large, it's easy to follow, and the changes feel in-step with things we've been discussing.

As promised, saved some granular and nit picky questions / comments for this "real" PR. Functionality-wise, really no concerns: I see this operating as intended, and tests and linting is looking good. Comments are mostly stylistic, method organization, and some questions that may be better spun off as separate issues.

Curious thoughts on a couple of the questions, but close to approval with or without any changes.

transmogrifier/sources/transformer.py Show resolved Hide resolved
transmogrifier/sources/xmltransformer.py Show resolved Hide resolved
transmogrifier/sources/transformer.py Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/xmltransformer.py Show resolved Hide resolved
transmogrifier/helpers.py Show resolved Hide resolved
transmogrifier/sources/xmltransformer.py Show resolved Hide resolved
@jonavellecuerdo
Copy link
Contributor

@ehanson8 The additional updates are looking good to me! Waiting for response re: ordering (had a follow-up question for @ghukill ) and think one change request still needs to be implemented.

* Remove obsolete lxml comments and linting overrides
* Update Transformer.transform method docstring
* Refactor Transformer.get_optional_field_methods
@ehanson8
Copy link
Contributor Author

ehanson8 commented Aug 8, 2024

@ghukill @jonavellecuerdo Changes pushed!

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it so!

@ehanson8 ehanson8 merged commit 1910426 into main Aug 9, 2024
5 checks passed
@ehanson8 ehanson8 deleted the TIMX-291-orchestration branch August 9, 2024 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants