-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timx 288 marc field method refactor 2 #201
Conversation
contributors.extend( | ||
[ | ||
timdex.Contributor(value=name, kind=kind.strip(" .,")) | ||
for kind in sorted(kinds, key=lambda k: k.lower()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chose to sort the "kind" types before timdex.Contributor
instances are created so that, in the event that a record is updated, the ordering of contributors
in the TIMDEX record doesn't change every time (though I don't expect TIMDEX users to notice it). In any case, this allows us to avoid needing to use set()
in the get_contributor
unit tests.
f45b0e8
to
c290aa5
Compare
def test_get_content_type_transforms_correctly_if_char_position_blank(): | ||
source_record = create_marc_source_record_stub( | ||
leader_field_insert="<leader>03282n m 2200721Ki 4500</leader>" | ||
) | ||
assert Marc.get_content_type(source_record) is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For tests involving values retrieved from fixed length fields, I wasn't sure how to (or if we can) depict "missing" tests. As shown in line 71, the whitespace between "n" and "m" (position 6 of the string) represents a blank / technically missing character.
Same thought for the get_dates
field method tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that "missing" is basically impossible here!
c290aa5
to
891cd56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good to me! Left an optional stylistic comment, but approved with or without.
contributor_marc_fields = [ | ||
{ | ||
"tag": "100", | ||
"subfields": "abcq", | ||
}, | ||
{ | ||
"tag": "110", | ||
"subfields": "abc", | ||
}, | ||
{ | ||
"tag": "111", | ||
"subfields": "acdfgjq", | ||
}, | ||
{ | ||
"tag": "700", | ||
"subfields": "abcq", | ||
}, | ||
{ | ||
"tag": "710", | ||
"subfields": "abc", | ||
}, | ||
{ | ||
"tag": "711", | ||
"subfields": "acdfgjq", | ||
}, | ||
] | ||
|
||
for contributor_marc_field in contributor_marc_fields: | ||
for datafield in source_record.find_all( | ||
"datafield", tag=contributor_marc_field["tag"] | ||
): | ||
if contributor_name := ( | ||
cls.create_subfield_value_string_from_datafield( | ||
datafield, | ||
contributor_marc_field["subfields"], | ||
" ", | ||
) | ||
): | ||
contributor_name = contributor_name.rstrip(" .,") | ||
contributor_kinds = cls.create_subfield_value_list_from_datafield( | ||
datafield, "e" | ||
) | ||
contributors_dict[contributor_name].update(contributor_kinds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is a nice reworking of this logic: build a data structure, produce a list Contributor
objects. Nice work here!
Just stylistic -- no need to implement, but wanted to float -- is that we could potentially remove some dictionary overhead by using tuples:
contributor_marc_fields = [
("100", "abcq"),
("110", "abc"),
("111", "acdfgjq"),
("700", "abcq"),
("710", "abc"),
("711", "acdfgjq"),
]
for tag, subfields in contributor_marc_fields:
for datafield in source_record.find_all("datafield", tag=tag):
if contributor_name := (
cls.create_subfield_value_string_from_datafield(
datafield,
subfields,
" ",
)
):
contributor_name = contributor_name.rstrip(" .,")
contributor_kinds = cls.create_subfield_value_list_from_datafield(
datafield, "e"
)
contributors_dict[contributor_name].update(contributor_kinds)
I find the usage of tag
and subfields
a little easier to scan, but effect is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes! I definitely like the idea of assigning the values in the dictionary / tuple to variables. If it's alright, I will plan on doing one final "cleanup" PR for Marc
for improvements to readability; e.g., another change I was hoping to make was using keyword arguments in the create_subfield_value_list_from_datafield
/ create_subfield_value_string_from_datafield
function calls. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do appreciate the explicitness of the dict at a glance re: tag
and subfields
vs a tuple but agree that they should shift to variables in the loop (contributor_marc_fields.items()
)! If we do go tuples instead, we should do that across the module for consistency (and anywhere else in the repo that pattern is used)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff! A few comments but approved!
@@ -857,6 +853,13 @@ def get_contributors(cls, source_record: Tag) -> list[timdex.Contributor] | None | |||
) | |||
return contributors or None | |||
|
|||
@classmethod | |||
def get_dates(cls, source_record: Tag) -> list[timdex.Date] | None: | |||
publication_year = cls._get_control_field(source_record)[7:11].strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this pattern
Why these changes are being introduced: * These updates are required to implement the architecture described in the following ADR: https://github.com/MITLibraries/transmogrifier/blob/main/docs/adrs/0005-field-methods.md How this addresses that need: * Add field methods and corresponding unit tests: content_type, contents, contributors, dates, edition, holdings, links Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-288
891cd56
to
371535e
Compare
Purpose and background context
Field method refactor for transform class
Marc
(Part 2).content_type
,contents
,contributors
,dates
,editions
,holdings
,links
].Note: Links are derived from
holdings
for electronic items, so creating a field method forholdings
required creating a field method forlink
(to pass unit tests).How can a reviewer manually see the effects of these changes?
make test
and verify all unit tests are passing.Includes new or updated dependencies?
NO
Changes expectations for external applications?
NO
What are the relevant tickets?
https://mitlibraries.atlassian.net/browse/TIMX-288
Developer
Code Reviewer(s)