Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update exception handling #175

Merged
merged 3 commits into from
May 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions transmogrifier/exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
class DeletedRecordEvent(Exception): # noqa: N818
"""Exception raised for records with a deleted status.

Attributes:
timdex_record_id: The TIMDEX record ID (not the source record ID) for the record.
"""

def __init__(self, timdex_record_id: str) -> None:
self.timdex_record_id = timdex_record_id


class SkippedRecordEvent(Exception): # noqa: N818
"""Exception raised for records that should be skipped.

Attributes:
source_record_id: The ID for the source record.
"""

def __init__(self, message: str | None = None, source_record_id: str | None = None):
super().__init__(message)
self.source_record_id = source_record_id
ehanson8 marked this conversation as resolved.
Show resolved Hide resolved
Comment on lines +19 to +21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

12 changes: 0 additions & 12 deletions transmogrifier/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,15 +132,3 @@ def validate_date_range(
end_date,
)
return False


class DeletedRecordEvent(Exception): # noqa: N818
"""Exception raised for records with a deleted status.

Attributes:
timdex_record_id: The TIMDEX record ID (not the source record ID) for the record

"""

def __init__(self, timdex_record_id: str) -> None:
self.timdex_record_id = timdex_record_id
ehanson8 marked this conversation as resolved.
Show resolved Hide resolved
16 changes: 10 additions & 6 deletions transmogrifier/sources/transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@
# should not be a security issue.
import transmogrifier.models as timdex
from transmogrifier.config import SOURCES
from transmogrifier.helpers import DeletedRecordEvent, generate_citation, validate_date
from transmogrifier.exceptions import DeletedRecordEvent, SkippedRecordEvent
from transmogrifier.helpers import generate_citation, validate_date

if TYPE_CHECKING:
from collections.abc import Iterator
Expand Down Expand Up @@ -68,11 +69,14 @@ def __next__(self) -> timdex.TimdexRecord:
except DeletedRecordEvent as error:
self.deleted_records.append(error.timdex_record_id)
continue
if record:
self.transformed_record_count += 1
return record
self.skipped_record_count += 1
continue
except SkippedRecordEvent:
self.skipped_record_count += 1
continue
Comment on lines +72 to +74
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SkippedRecordEvent looks great, slots right in.

For posterity's sake, just commenting that in our discussion, we had also talked about how more granular exceptions, e.g. invalid records, records with runtime-y errors, or otherwise, could extend SkippedRecordEvent to still trigger this code path of "skipping" a record.

Lastly, related to this comment in the PR,

"Logging can be added as necessary depending on what triggered the skip."

If within code we set a custom message on the SkippedRecordEvent, maybe this would be good place to log that? That would allow setting the exception + message deep in code, confident it would bubble up here and get logged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to comment on the lines below, as they did not change, but thoughts on updating the logic to treat a successful record as the default path?

I know this will get touched on in orchestration updates, but while touching this __next__ method, thought it could be a good time to nudge it.

e.g.

if not record:
    self.skipped_record_count += 1
    continue
self.transformed_record_count += 1
return record  

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on the exception message and I was frustrated with the repetition in that method so thank you very much for such a simple refactor!

if not record:
self.skipped_record_count += 1
continue
self.transformed_record_count += 1
return record

@final
def transform_and_write_output_files(self, output_file: str) -> None:
Expand Down
4 changes: 3 additions & 1 deletion transmogrifier/sources/xml/datacite.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from bs4 import Tag # type: ignore[import-untyped]

import transmogrifier.models as timdex
from transmogrifier.exceptions import SkippedRecordEvent
from transmogrifier.helpers import validate_date, validate_date_range
from transmogrifier.sources.xmltransformer import XMLTransformer

Expand Down Expand Up @@ -54,7 +55,8 @@ def get_optional_fields(self, xml: Tag) -> dict | None:
if self.valid_content_types([content_type]):
fields["content_type"] = [content_type]
else:
return None
message = f'Record skipped based on content type: "{content_type}"'
raise SkippedRecordEvent(message, source_record_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I might suggest as we revist raising these in the future that we have a convention of positional message and then named arguments (e.g. source_record_id=source_record_id), but not blocking now as this definitely works too!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, too!

Copy link
Contributor Author

@ehanson8 ehanson8 May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 , I agree with that pattern as well going forward

else:
logger.warning(
"Datacite record %s missing required Datacite field resourceType",
Expand Down
Loading