Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timx 288 marc field method refactor 2 #201

Merged
merged 1 commit into from
Jul 25, 2024

Conversation

jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Jul 24, 2024

Purpose and background context

Field method refactor for transform class Marc (Part 2).

  • Added field methods and corresponding unit tests for the following fields: [content_type, contents, contributors, dates, editions, holdings, links].

Note: Links are derived from holdings for electronic items, so creating a field method for holdings required creating a field method for link (to pass unit tests).

How can a reviewer manually see the effects of these changes?

  1. Run make test and verify all unit tests are passing.
  2. Run CLI command
    pipenv run transform -i tests/fixtures/marc/marc_record_all_fields.xml -o output/marc-transformed-records.json -s alma
    
    Output:
    2024-07-24 13:06:08,055 INFO transmogrifier.cli.main(): Logger 'root' configured with level=INFO
    2024-07-24 13:06:08,055 INFO transmogrifier.cli.main(): No Sentry DSN found, exceptions will not be sent to Sentry
    2024-07-24 13:06:08,055 INFO transmogrifier.cli.main(): Running transform for source alma
    2024-07-24 13:06:08,576 INFO transmogrifier.cli.main(): Completed transform, total records processed: 1, transformed records: 1, skipped records: 0, deleted records: 0
    2024-07-24 13:06:08,576 INFO transmogrifier.cli.main(): Total time to complete transform: 0:00:00.521201
    

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

https://mitlibraries.atlassian.net/browse/TIMX-288

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

@jonavellecuerdo jonavellecuerdo self-assigned this Jul 24, 2024
contributors.extend(
[
timdex.Contributor(value=name, kind=kind.strip(" .,"))
for kind in sorted(kinds, key=lambda k: k.lower())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chose to sort the "kind" types before timdex.Contributor instances are created so that, in the event that a record is updated, the ordering of contributors in the TIMDEX record doesn't change every time (though I don't expect TIMDEX users to notice it). In any case, this allows us to avoid needing to use set() in the get_contributor unit tests.

@jonavellecuerdo jonavellecuerdo force-pushed the TIMX-288-marc-field-method-refactor-2 branch from f45b0e8 to c290aa5 Compare July 24, 2024 17:13
Comment on lines +969 to +973
def test_get_content_type_transforms_correctly_if_char_position_blank():
source_record = create_marc_source_record_stub(
leader_field_insert="<leader>03282n m 2200721Ki 4500</leader>"
)
assert Marc.get_content_type(source_record) is None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests involving values retrieved from fixed length fields, I wasn't sure how to (or if we can) depict "missing" tests. As shown in line 71, the whitespace between "n" and "m" (position 6 of the string) represents a blank / technically missing character.

Same thought for the get_dates field method tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that "missing" is basically impossible here!

@jonavellecuerdo jonavellecuerdo marked this pull request as draft July 24, 2024 17:27
@jonavellecuerdo jonavellecuerdo force-pushed the TIMX-288-marc-field-method-refactor-2 branch from c290aa5 to 891cd56 Compare July 24, 2024 19:04
@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review July 24, 2024 19:08
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me! Left an optional stylistic comment, but approved with or without.

Comment on lines +700 to +742
contributor_marc_fields = [
{
"tag": "100",
"subfields": "abcq",
},
{
"tag": "110",
"subfields": "abc",
},
{
"tag": "111",
"subfields": "acdfgjq",
},
{
"tag": "700",
"subfields": "abcq",
},
{
"tag": "710",
"subfields": "abc",
},
{
"tag": "711",
"subfields": "acdfgjq",
},
]

for contributor_marc_field in contributor_marc_fields:
for datafield in source_record.find_all(
"datafield", tag=contributor_marc_field["tag"]
):
if contributor_name := (
cls.create_subfield_value_string_from_datafield(
datafield,
contributor_marc_field["subfields"],
" ",
)
):
contributor_name = contributor_name.rstrip(" .,")
contributor_kinds = cls.create_subfield_value_list_from_datafield(
datafield, "e"
)
contributors_dict[contributor_name].update(contributor_kinds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is a nice reworking of this logic: build a data structure, produce a list Contributor objects. Nice work here!

Just stylistic -- no need to implement, but wanted to float -- is that we could potentially remove some dictionary overhead by using tuples:

contributor_marc_fields = [
    ("100", "abcq"),
    ("110", "abc"),
    ("111", "acdfgjq"),
    ("700", "abcq"),
    ("710", "abc"),
    ("711", "acdfgjq"),
]

for tag, subfields in contributor_marc_fields:
    for datafield in source_record.find_all("datafield", tag=tag):
        if contributor_name := (
            cls.create_subfield_value_string_from_datafield(
                datafield,
                subfields,
                " ",
            )
        ):
            contributor_name = contributor_name.rstrip(" .,")
            contributor_kinds = cls.create_subfield_value_list_from_datafield(
                datafield, "e"
            )
            contributors_dict[contributor_name].update(contributor_kinds)

I find the usage of tag and subfields a little easier to scan, but effect is the same.

Copy link
Contributor Author

@jonavellecuerdo jonavellecuerdo Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes! I definitely like the idea of assigning the values in the dictionary / tuple to variables. If it's alright, I will plan on doing one final "cleanup" PR for Marc for improvements to readability; e.g., another change I was hoping to make was using keyword arguments in the create_subfield_value_list_from_datafield/ create_subfield_value_string_from_datafield function calls. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do appreciate the explicitness of the dict at a glance re: tag and subfields vs a tuple but agree that they should shift to variables in the loop (contributor_marc_fields.items())! If we do go tuples instead, we should do that across the module for consistency (and anywhere else in the repo that pattern is used)

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! A few comments but approved!

tests/sources/xml/test_marc.py Outdated Show resolved Hide resolved
@@ -857,6 +853,13 @@ def get_contributors(cls, source_record: Tag) -> list[timdex.Contributor] | None
)
return contributors or None

@classmethod
def get_dates(cls, source_record: Tag) -> list[timdex.Date] | None:
publication_year = cls._get_control_field(source_record)[7:11].strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this pattern

Why these changes are being introduced:
* These updates are required to implement the architecture described
in the following ADR: https://github.com/MITLibraries/transmogrifier/blob/main/docs/adrs/0005-field-methods.md

How this addresses that need:
* Add field methods and corresponding unit tests:
  content_type, contents, contributors, dates,
  edition, holdings, links

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-288
@jonavellecuerdo jonavellecuerdo force-pushed the TIMX-288-marc-field-method-refactor-2 branch from 891cd56 to 371535e Compare July 25, 2024 15:51
@jonavellecuerdo jonavellecuerdo merged commit 76aabc0 into main Jul 25, 2024
3 checks passed
@jonavellecuerdo jonavellecuerdo deleted the TIMX-288-marc-field-method-refactor-2 branch July 25, 2024 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants