From 1e0ccb5d12b1993a474ff125290c84bade14b426 Mon Sep 17 00:00:00 2001 From: Eric Hanson Date: Tue, 14 May 2024 16:22:29 -0400 Subject: [PATCH 1/3] Field method ADR Why these changes are being introduced: * Discussion is needed on a proposed field method refactor How this addresses that need: * Add ADR for proposed field method refactor Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-275 --- docs/adrs/0005-field-methods.md | 67 +++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 docs/adrs/0005-field-methods.md diff --git a/docs/adrs/0005-field-methods.md b/docs/adrs/0005-field-methods.md new file mode 100644 index 0000000..c7be0e8 --- /dev/null +++ b/docs/adrs/0005-field-methods.md @@ -0,0 +1,67 @@ +# 1. Field methods refactor + +Date: 2024-05-14 + +## Status + +Proposed + +## Context + +### Goal +Simplify the application by refactoring it to use field methods to populate a `TimdexRecord` instance. + +### Background +A `TimdexRecord` contains the following fields: + +* REQUIRED + * `source` + * `source_link` + * `timdex_record_id` + * `title` + + * OPTIONAL + * `alternate_titles` + * `call_numbers` + * `citation` + * `content_type` + * `contents` + * `contributors` + * `dates` + * `edition` + * `file_formats` + * `format` + * `funding_information` + * `holdings` + * `identifiers` + * `languages` + * `links` + * `literary_form` + * `locations` + * `notes` + * `numbering` + * `physical_description` + * `publication_frequency` + * `publishers` + * `related_items` + * `rights` + * `subjects` + * `summary` + +Currently much of the transformation logic for each source is found in the `get_optional_fields` method which extracts data for most of the fields. This results in very large methods as well as inefficient orchestration and testing. + +A field method approach would address many of these complications. All of the required fields have dedicated field methods as well as some of the complex fields that are called by `get_optional_fields`. This refactor can be seen as an extension of that approach. + +## Decision + +Refactor each source transform to use dedicated methods for each field (e.g. `get_dates`,`get_contributors`, `get_title`) with associated unit tests to cover the expected data scenarios. + + +## Consequences +Field methods should ultimately simplfy the source transforms, and the overall application, by providing an easily repeatable structure with method docstrings offering additional context for the more complicated logic required for some fields. + +Testing should also be significantly improved. The current testing suite is built around very large fixtures and unit tests meant to encompass the expected data scenarios for each source's records. They are not easy to parse and can be a barrier to new developers trying to understand the application. Separate tests for each field method should simplfy the testing infrastructure and make it easier to maintain. + +Logging and associated analytics should also be easier to manage and provide a better view of the data transformed by this application. + +During this refactor, there should be no changes to the records produced the application. However, the code changes will benefit developers by making the application easier to maintain, extend, and test. \ No newline at end of file From 129b424670803bd8bfc981b73b18a6583831f2eb Mon Sep 17 00:00:00 2001 From: Eric Hanson Date: Wed, 15 May 2024 09:49:18 -0400 Subject: [PATCH 2/3] Update ADR based on discussion in PR #174 --- docs/adrs/0005-field-methods.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/adrs/0005-field-methods.md b/docs/adrs/0005-field-methods.md index c7be0e8..ea0e104 100644 --- a/docs/adrs/0005-field-methods.md +++ b/docs/adrs/0005-field-methods.md @@ -48,9 +48,9 @@ A `TimdexRecord` contains the following fields: * `subjects` * `summary` -Currently much of the transformation logic for each source is found in the `get_optional_fields` method which extracts data for most of the fields. This results in very large methods as well as inefficient orchestration and testing. +Currently much of the transformation logic for each source is found in the `get_optional_fields` method which extracts data for all optional fields. This results in very large methods as well as inefficient orchestration and testing. -A field method approach would address many of these complications. All of the required fields have dedicated field methods as well as some of the complex fields that are called by `get_optional_fields`. This refactor can be seen as an extension of that approach. +A field method approach would address many of these complications, where each `TimdexRecord` field has a corresponding method containing all of the logic related to extracting and formatting the values for that field. All of the required fields have dedicated field methods as well as some of the complex fields that are called by `get_optional_fields`. This refactor can be seen as an extension of that approach. ## Decision @@ -62,6 +62,6 @@ Field methods should ultimately simplfy the source transforms, and the overall a Testing should also be significantly improved. The current testing suite is built around very large fixtures and unit tests meant to encompass the expected data scenarios for each source's records. They are not easy to parse and can be a barrier to new developers trying to understand the application. Separate tests for each field method should simplfy the testing infrastructure and make it easier to maintain. -Logging and associated analytics should also be easier to manage and provide a better view of the data transformed by this application. +Logging and associated analytics should also be easier to manage and provide a better view of the data transformed by this application. During orchestration of the transform, the field methods will provide a consistent and shared way of reporting errors or outlier behavior when transforming fields. During this refactor, there should be no changes to the records produced the application. However, the code changes will benefit developers by making the application easier to maintain, extend, and test. \ No newline at end of file From 7fd05a3b19896bf2a828a6d86c0088f433ed993e Mon Sep 17 00:00:00 2001 From: Eric Hanson Date: Wed, 15 May 2024 13:35:27 -0400 Subject: [PATCH 3/3] Update ADR status to accepted after PR approval --- docs/adrs/0005-field-methods.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/adrs/0005-field-methods.md b/docs/adrs/0005-field-methods.md index ea0e104..1003995 100644 --- a/docs/adrs/0005-field-methods.md +++ b/docs/adrs/0005-field-methods.md @@ -4,7 +4,7 @@ Date: 2024-05-14 ## Status -Proposed +Accepted ## Context