Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update exception handling #175

Merged
merged 3 commits into from
May 21, 2024
Merged

Conversation

ehanson8
Copy link
Contributor

Purpose and background context

Adding more explicit flow control through exception handling in the Transformer class. Previously, records were skipped when get_optional_fields returned None. Adding a SkippedRecordEvent exception that can handle records that should be skipped for an invalid content_type or records with significant structural issues that raise the risk of unknown exceptions later in the process. This is a minor update to the orchestration that should aid us with the field method refactor.

As we proceed with the refactor in other transforms, use the SkippedRecordEvent exception whenever get_optional_fields returns None. Logging can be added as necessary depending on what triggered the skip.

How can a reviewer manually see the effects of these changes?

The Datacite class was updated to raise theSkippedRecordEvent exception illustrating that tests such as test_zenodo_skips_records_with_invalid_content_types still pass.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

  • NA

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:
* Updating exception handling for the Transformer class for more explicit flow control.

How this addresses that need:
* Create exceptions module
* Add SkippedRecordEvent exception
* Move DeletedRecordEvent to exceptions module
* Update Datacite.get_optional_fields to call SkippedRecordEvent

Side effects of this change:
* None

Relevant ticket(s):
* NA
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Would be an outright approve, but curious about two things:

  1. giving SkippedRecordEvent exceptions a message (optionally) and logging that here
  2. tweak __next__ logic such that a successful record is the default, fall-through path

transmogrifier/exceptions.py Show resolved Hide resolved
transmogrifier/helpers.py Show resolved Hide resolved
Comment on lines +72 to +74
except SkippedRecordEvent:
self.skipped_record_count += 1
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SkippedRecordEvent looks great, slots right in.

For posterity's sake, just commenting that in our discussion, we had also talked about how more granular exceptions, e.g. invalid records, records with runtime-y errors, or otherwise, could extend SkippedRecordEvent to still trigger this code path of "skipping" a record.

Lastly, related to this comment in the PR,

"Logging can be added as necessary depending on what triggered the skip."

If within code we set a custom message on the SkippedRecordEvent, maybe this would be good place to log that? That would allow setting the exception + message deep in code, confident it would bubble up here and get logged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to comment on the lines below, as they did not change, but thoughts on updating the logic to treat a successful record as the default path?

I know this will get touched on in orchestration updates, but while touching this __next__ method, thought it could be a good time to nudge it.

e.g.

if not record:
    self.skipped_record_count += 1
    continue
self.transformed_record_count += 1
return record  

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on the exception message and I was frustrated with the repetition in that method so thank you very much for such a simple refactor!

@@ -54,7 +55,7 @@ def get_optional_fields(self, xml: Tag) -> dict | None:
if self.valid_content_types([content_type]):
fields["content_type"] = [content_type]
else:
return None
raise SkippedRecordEvent(source_record_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. But see comment above about adding messaging close to the error/situation, confident it would get logged higher up.

* Update Transformer.__next__ method for more logical flow
* Add message param to SkippedRecordEvent exception
* Update call of SkippedRecordEvent in Datacite class
@ehanson8 ehanson8 changed the base branch from field-method-refactor to main May 17, 2024 17:18
@ehanson8
Copy link
Contributor Author

@ghukill Made changes in response to your review. Also, didn't realize it's possible to change the target branch on an existing PR so I'll keep this one open

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but left some comments on the exceptions.

As noted, it feels like a time to iron out what they'll look and feel like, if they do become more central to things.

I could be persuaded that no changes are needed, but curious your thoughts!

Comment on lines 12 to 21
class SkippedRecordEvent(Exception): # noqa: N818
"""Exception raised for records that should be skipped.

Attributes:
source_record_id: The ID for the source record.
"""

def __init__(self, source_record_id: str | None, message: str) -> None:
self.source_record_id = source_record_id
self.message = message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some nuance to consider here, but feels like the time to think about these exceptions as a) they are now going into main, and b) might be fairly central to new orchestration patterns when implemented.

I would propose two things:

  1. We can instantiate these exceptions just with a string message if we like, just like normal exceptions, and have this represented when you stringify the exception object
  2. All extra properties are optional, unless we decide they should be required

Example of how this could be done:

class SkippedRecordEvent(Exception):
    """"""
    def __init__(self, message:str|None=None, source_record_id:str|None=None):
        super().__init__(message) # calls base Exception constructor
        self.source_record_id = source_record_id

By default, the base Exception class is expecting a positional argument message that is then used in the __str__ method to stringify it.

This allows some behavior like this:

# acts like other exceptions, allows string message as first and only argument
In [19]: e = SkippedRecordEvent("Hello world!")

In [20]: str(e)
Out[20]: 'Hello world!'

In [21]: e
Out[21]: __main__.SkippedRecordEvent('Hello world!')

In [22]: e.source_record_id
# None passed

# also supports extra properties if defined
In [23]: e2 = SkippedRecordEvent("Hello world!", source_record_id="acb123")

In [24]: str(e2)
Out[24]: 'Hello world!'

# source_record_id also present if we want to use it
In [25]: e2.source_record_id
Out[25]: 'acb123'

# not ideal, but like default exceptions, can even raise it without messages
In [26]: e3 = SkippedRecordEvent()

In [27]: str(e3)
Out[27]: 'None' # string of None

In [28]: e3.source_record_id
# also None

Two additional considerations, maybe now, maybe later:

  1. Does this have implications for DeletedRecordEvent? Do we want to update patterns so it has similar default, predictable behavior?
  2. Do we want more granular exception classes like InvalidRecordError that extend SkippedRecordEvent, such that we could raise those more meaningful, granular exceptions from code, but still have a single except SkippedRecordEvent: ... logic at the higher level to group skipping behavior for all those child Exception types?

Both seem like they could wait. And for that matter, the suggestions above could too. But felt like a good time to raise them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the proposed exception formatting!

Re: additional considerations, those feel better to address during the full orchestration discussion. It doesn't feel like we have full use cases for either yet (correct me if I'm wrong)but I would like to discuss both in the more comprehensive context of the orchestration refactor.

If that sounds OK, we can create an issue with those 2 points under something like Exception refactoring and as related items come up they could be added as comments. Open to other ways of doing it though!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. If SkippedRecordEvent is updated now to allow for default-y Exception behavior discussed above, then the rest feels like something we could suss out in future work. There aren't that many instances of raising these from code, so would be easy to ctrl + f them and then think about what new exceptions might make sense to raise there.

Copy link
Contributor Author

@ehanson8 ehanson8 May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New commit pushed and issue created!

@@ -54,7 +55,8 @@ def get_optional_fields(self, xml: Tag) -> dict | None:
if self.valid_content_types([content_type]):
fields["content_type"] = [content_type]
else:
return None
message = f'Record skipped based on content type: "{content_type}"'
raise SkippedRecordEvent(source_record_id, message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above about possible reworking of these custom exceptions.

Even if we don't go that route, I think perhaps the message should be the first argument, and positional, and then source_record_id be a named argument is optional.

@ehanson8
Copy link
Contributor Author

And agreed on figuring this out however many commits it takes!

@ghukill ghukill self-requested a review May 17, 2024 20:05
* Shift param order for SkippedRecordEvent
@ehanson8 ehanson8 mentioned this pull request May 17, 2024
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Agree with the changes that were proposed by @ghukill. Still need to review the latest discussion/comments posted above!

@jonavellecuerdo jonavellecuerdo self-requested a review May 17, 2024 20:24
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for this intricate scaffolding work that is backwards/forwards compatible. Think it's establishing a nice pattern for more granular exception raising as we progress.

Comment on lines +19 to +21
def __init__(self, message: str | None = None, source_record_id: str | None = None):
super().__init__(message)
self.source_record_id = source_record_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -54,7 +55,8 @@ def get_optional_fields(self, xml: Tag) -> dict | None:
if self.valid_content_types([content_type]):
fields["content_type"] = [content_type]
else:
return None
message = f'Record skipped based on content type: "{content_type}"'
raise SkippedRecordEvent(message, source_record_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I might suggest as we revist raising these in the future that we have a convention of positional message and then named arguments (e.g. source_record_id=source_record_id), but not blocking now as this definitely works too!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, too!

Copy link
Contributor Author

@ehanson8 ehanson8 May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 , I agree with that pattern as well going forward

@ehanson8 ehanson8 merged commit 5f91403 into main May 21, 2024
5 checks passed
@ehanson8 ehanson8 deleted the TIMX-284-datacite-field-method-refactor branch May 21, 2024 13:28
@ehanson8 ehanson8 restored the TIMX-284-datacite-field-method-refactor branch May 21, 2024 13:28
ehanson8 added a commit that referenced this pull request May 21, 2024
* Update Transformer.__next__ method for more logical flow
* Add message param to SkippedRecordEvent exception
* Update call of SkippedRecordEvent in Datacite class
ehanson8 added a commit that referenced this pull request May 21, 2024
* Shift param order for SkippedRecordEvent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants