Timx 288 marc field method refactor 2 #201

jonavellecuerdo · 2024-07-24T17:05:40Z

Purpose and background context

Field method refactor for transform class Marc (Part 2).

Added field methods and corresponding unit tests for the following fields: [content_type, contents, contributors, dates, editions, holdings, links].

Note: Links are derived from holdings for electronic items, so creating a field method for holdings required creating a field method for link (to pass unit tests).

How can a reviewer manually see the effects of these changes?

Run make test and verify all unit tests are passing.

Run CLI command

pipenv run transform -i tests/fixtures/marc/marc_record_all_fields.xml -o output/marc-transformed-records.json -s alma

Output:

2024-07-24 13:06:08,055 INFO transmogrifier.cli.main(): Logger 'root' configured with level=INFO
2024-07-24 13:06:08,055 INFO transmogrifier.cli.main(): No Sentry DSN found, exceptions will not be sent to Sentry
2024-07-24 13:06:08,055 INFO transmogrifier.cli.main(): Running transform for source alma
2024-07-24 13:06:08,576 INFO transmogrifier.cli.main(): Completed transform, total records processed: 1, transformed records: 1, skipped records: 0, deleted records: 0
2024-07-24 13:06:08,576 INFO transmogrifier.cli.main(): Total time to complete transform: 0:00:00.521201

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

https://mitlibraries.atlassian.net/browse/TIMX-288

Developer

All new ENV is documented in README
All new ENV has been added to staging and production environments
All related Jira tickets are linked in commit message(s)
Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

The commit message is clear and follows our guidelines (not just this PR message)
There are appropriate tests covering any new functionality
The provided documentation is sufficient for understanding any new functionality introduced
Any manual tests have been performed and verified
New dependencies are appropriate or there were no changes

jonavellecuerdo · 2024-07-24T17:10:35Z

transmogrifier/sources/xml/marc.py

+                contributors.extend(
+                    [
+                        timdex.Contributor(value=name, kind=kind.strip(" .,"))
+                        for kind in sorted(kinds, key=lambda k: k.lower())


Chose to sort the "kind" types before timdex.Contributor instances are created so that, in the event that a record is updated, the ordering of contributors in the TIMDEX record doesn't change every time (though I don't expect TIMDEX users to notice it). In any case, this allows us to avoid needing to use set() in the get_contributor unit tests.

jonavellecuerdo · 2024-07-24T17:20:31Z

tests/sources/xml/test_marc.py

+def test_get_content_type_transforms_correctly_if_char_position_blank():
+    source_record = create_marc_source_record_stub(
+        leader_field_insert="<leader>03282n m  2200721Ki 4500</leader>"
+    )
+    assert Marc.get_content_type(source_record) is None


For tests involving values retrieved from fixed length fields, I wasn't sure how to (or if we can) depict "missing" tests. As shown in line 71, the whitespace between "n" and "m" (position 6 of the string) represents a blank / technically missing character.

Same thought for the get_dates field method tests.

Agree that "missing" is basically impossible here!

ghukill

Looking good to me! Left an optional stylistic comment, but approved with or without.

ghukill · 2024-07-24T20:34:43Z

transmogrifier/sources/xml/marc.py

+        contributor_marc_fields = [
+            {
+                "tag": "100",
+                "subfields": "abcq",
+            },
+            {
+                "tag": "110",
+                "subfields": "abc",
+            },
+            {
+                "tag": "111",
+                "subfields": "acdfgjq",
+            },
+            {
+                "tag": "700",
+                "subfields": "abcq",
+            },
+            {
+                "tag": "710",
+                "subfields": "abc",
+            },
+            {
+                "tag": "711",
+                "subfields": "acdfgjq",
+            },
+        ]
+
+        for contributor_marc_field in contributor_marc_fields:
+            for datafield in source_record.find_all(
+                "datafield", tag=contributor_marc_field["tag"]
+            ):
+                if contributor_name := (
+                    cls.create_subfield_value_string_from_datafield(
+                        datafield,
+                        contributor_marc_field["subfields"],
+                        " ",
+                    )
+                ):
+                    contributor_name = contributor_name.rstrip(" .,")
+                    contributor_kinds = cls.create_subfield_value_list_from_datafield(
+                        datafield, "e"
+                    )
+                    contributors_dict[contributor_name].update(contributor_kinds)


Overall, I think this is a nice reworking of this logic: build a data structure, produce a list Contributor objects. Nice work here!

Just stylistic -- no need to implement, but wanted to float -- is that we could potentially remove some dictionary overhead by using tuples:

contributor_marc_fields = [ ("100", "abcq"), ("110", "abc"), ("111", "acdfgjq"), ("700", "abcq"), ("710", "abc"), ("711", "acdfgjq"), ] for tag, subfields in contributor_marc_fields: for datafield in source_record.find_all("datafield", tag=tag): if contributor_name := ( cls.create_subfield_value_string_from_datafield( datafield, subfields, " ", ) ): contributor_name = contributor_name.rstrip(" .,") contributor_kinds = cls.create_subfield_value_list_from_datafield( datafield, "e" ) contributors_dict[contributor_name].update(contributor_kinds)

I find the usage of tag and subfields a little easier to scan, but effect is the same.

Ah, yes! I definitely like the idea of assigning the values in the dictionary / tuple to variables. If it's alright, I will plan on doing one final "cleanup" PR for Marc for improvements to readability; e.g., another change I was hoping to make was using keyword arguments in the create_subfield_value_list_from_datafield/ create_subfield_value_string_from_datafield function calls. 🤔

I do appreciate the explicitness of the dict at a glance re: tag and subfields vs a tuple but agree that they should shift to variables in the loop (contributor_marc_fields.items())! If we do go tuples instead, we should do that across the module for consistency (and anywhere else in the repo that pattern is used)

ehanson8

Good stuff! A few comments but approved!

tests/sources/xml/test_marc.py

ehanson8 · 2024-07-25T13:50:49Z

transmogrifier/sources/xml/marc.py

@@ -857,6 +853,13 @@ def get_contributors(cls, source_record: Tag) -> list[timdex.Contributor] | None
                )
        return contributors or None

+    @classmethod
+    def get_dates(cls, source_record: Tag) -> list[timdex.Date] | None:
+        publication_year = cls._get_control_field(source_record)[7:11].strip()


I like this pattern

Why these changes are being introduced: * These updates are required to implement the architecture described in the following ADR: https://github.com/MITLibraries/transmogrifier/blob/main/docs/adrs/0005-field-methods.md How this addresses that need: * Add field methods and corresponding unit tests: content_type, contents, contributors, dates, edition, holdings, links Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-288

jonavellecuerdo self-assigned this Jul 24, 2024

jonavellecuerdo commented Jul 24, 2024

View reviewed changes

jonavellecuerdo force-pushed the TIMX-288-marc-field-method-refactor-2 branch from f45b0e8 to c290aa5 Compare July 24, 2024 17:13

jonavellecuerdo commented Jul 24, 2024

View reviewed changes

jonavellecuerdo requested review from ehanson8 and ghukill July 24, 2024 17:20

jonavellecuerdo marked this pull request as draft July 24, 2024 17:27

jonavellecuerdo removed request for ghukill and ehanson8 July 24, 2024 17:27

jonavellecuerdo force-pushed the TIMX-288-marc-field-method-refactor-2 branch from c290aa5 to 891cd56 Compare July 24, 2024 19:04

jonavellecuerdo requested review from ghukill and ehanson8 July 24, 2024 19:08

jonavellecuerdo marked this pull request as ready for review July 24, 2024 19:08

ghukill approved these changes Jul 24, 2024

View reviewed changes

ehanson8 approved these changes Jul 25, 2024

View reviewed changes

jonavellecuerdo force-pushed the TIMX-288-marc-field-method-refactor-2 branch from 891cd56 to 371535e Compare July 25, 2024 15:51

jonavellecuerdo merged commit 76aabc0 into main Jul 25, 2024
3 checks passed

jonavellecuerdo deleted the TIMX-288-marc-field-method-refactor-2 branch July 25, 2024 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timx 288 marc field method refactor 2 #201

Timx 288 marc field method refactor 2 #201

jonavellecuerdo commented Jul 24, 2024 •

edited

Loading

jonavellecuerdo Jul 24, 2024

jonavellecuerdo Jul 24, 2024

ehanson8 Jul 25, 2024

ghukill left a comment

ghukill Jul 24, 2024

jonavellecuerdo Jul 25, 2024 •

edited

Loading

ehanson8 Jul 25, 2024

ehanson8 left a comment

ehanson8 Jul 25, 2024

Timx 288 marc field method refactor 2 #201

Timx 288 marc field method refactor 2 #201

Conversation

jonavellecuerdo commented Jul 24, 2024 • edited Loading

Purpose and background context

How can a reviewer manually see the effects of these changes?

Includes new or updated dependencies?

Changes expectations for external applications?

What are the relevant tickets?

Developer

Code Reviewer(s)

jonavellecuerdo Jul 24, 2024

Choose a reason for hiding this comment

jonavellecuerdo Jul 24, 2024

Choose a reason for hiding this comment

ehanson8 Jul 25, 2024

Choose a reason for hiding this comment

ghukill left a comment

Choose a reason for hiding this comment

ghukill Jul 24, 2024

Choose a reason for hiding this comment

jonavellecuerdo Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

ehanson8 Jul 25, 2024

Choose a reason for hiding this comment

ehanson8 left a comment

Choose a reason for hiding this comment

ehanson8 Jul 25, 2024

Choose a reason for hiding this comment

jonavellecuerdo commented Jul 24, 2024 •

edited

Loading

jonavellecuerdo Jul 25, 2024 •

edited

Loading