-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update exception handling #175
Conversation
Why these changes are being introduced: * Updating exception handling for the Transformer class for more explicit flow control. How this addresses that need: * Create exceptions module * Add SkippedRecordEvent exception * Move DeletedRecordEvent to exceptions module * Update Datacite.get_optional_fields to call SkippedRecordEvent Side effects of this change: * None Relevant ticket(s): * NA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Would be an outright approve, but curious about two things:
- giving
SkippedRecordEvent
exceptions a message (optionally) and logging that here - tweak
__next__
logic such that a successful record is the default, fall-through path
except SkippedRecordEvent: | ||
self.skipped_record_count += 1 | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SkippedRecordEvent
looks great, slots right in.
For posterity's sake, just commenting that in our discussion, we had also talked about how more granular exceptions, e.g. invalid records, records with runtime-y errors, or otherwise, could extend SkippedRecordEvent
to still trigger this code path of "skipping" a record.
Lastly, related to this comment in the PR,
"Logging can be added as necessary depending on what triggered the skip."
If within code we set a custom message on the SkippedRecordEvent
, maybe this would be good place to log that? That would allow setting the exception + message deep in code, confident it would bubble up here and get logged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unable to comment on the lines below, as they did not change, but thoughts on updating the logic to treat a successful record as the default path?
I know this will get touched on in orchestration updates, but while touching this __next__
method, thought it could be a good time to nudge it.
e.g.
if not record:
self.skipped_record_count += 1
continue
self.transformed_record_count += 1
return record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call on the exception message and I was frustrated with the repetition in that method so thank you very much for such a simple refactor!
@@ -54,7 +55,7 @@ def get_optional_fields(self, xml: Tag) -> dict | None: | |||
if self.valid_content_types([content_type]): | |||
fields["content_type"] = [content_type] | |||
else: | |||
return None | |||
raise SkippedRecordEvent(source_record_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. But see comment above about adding messaging close to the error/situation, confident it would get logged higher up.
* Update Transformer.__next__ method for more logical flow * Add message param to SkippedRecordEvent exception * Update call of SkippedRecordEvent in Datacite class
@ghukill Made changes in response to your review. Also, didn't realize it's possible to change the target branch on an existing PR so I'll keep this one open |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, but left some comments on the exceptions.
As noted, it feels like a time to iron out what they'll look and feel like, if they do become more central to things.
I could be persuaded that no changes are needed, but curious your thoughts!
transmogrifier/exceptions.py
Outdated
class SkippedRecordEvent(Exception): # noqa: N818 | ||
"""Exception raised for records that should be skipped. | ||
|
||
Attributes: | ||
source_record_id: The ID for the source record. | ||
""" | ||
|
||
def __init__(self, source_record_id: str | None, message: str) -> None: | ||
self.source_record_id = source_record_id | ||
self.message = message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some nuance to consider here, but feels like the time to think about these exceptions as a) they are now going into main
, and b) might be fairly central to new orchestration patterns when implemented.
I would propose two things:
- We can instantiate these exceptions just with a string message if we like, just like normal exceptions, and have this represented when you stringify the exception object
- All extra properties are optional, unless we decide they should be required
Example of how this could be done:
class SkippedRecordEvent(Exception):
""""""
def __init__(self, message:str|None=None, source_record_id:str|None=None):
super().__init__(message) # calls base Exception constructor
self.source_record_id = source_record_id
By default, the base Exception
class is expecting a positional argument message
that is then used in the __str__
method to stringify it.
This allows some behavior like this:
# acts like other exceptions, allows string message as first and only argument
In [19]: e = SkippedRecordEvent("Hello world!")
In [20]: str(e)
Out[20]: 'Hello world!'
In [21]: e
Out[21]: __main__.SkippedRecordEvent('Hello world!')
In [22]: e.source_record_id
# None passed
# also supports extra properties if defined
In [23]: e2 = SkippedRecordEvent("Hello world!", source_record_id="acb123")
In [24]: str(e2)
Out[24]: 'Hello world!'
# source_record_id also present if we want to use it
In [25]: e2.source_record_id
Out[25]: 'acb123'
# not ideal, but like default exceptions, can even raise it without messages
In [26]: e3 = SkippedRecordEvent()
In [27]: str(e3)
Out[27]: 'None' # string of None
In [28]: e3.source_record_id
# also None
Two additional considerations, maybe now, maybe later:
- Does this have implications for
DeletedRecordEvent
? Do we want to update patterns so it has similar default, predictable behavior? - Do we want more granular exception classes like
InvalidRecordError
that extendSkippedRecordEvent
, such that we could raise those more meaningful, granular exceptions from code, but still have a singleexcept SkippedRecordEvent: ...
logic at the higher level to group skipping behavior for all those child Exception types?
Both seem like they could wait. And for that matter, the suggestions above could too. But felt like a good time to raise them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on the proposed exception formatting!
Re: additional considerations, those feel better to address during the full orchestration discussion. It doesn't feel like we have full use cases for either yet (correct me if I'm wrong)but I would like to discuss both in the more comprehensive context of the orchestration refactor.
If that sounds OK, we can create an issue with those 2 points under something like Exception refactoring
and as related items come up they could be added as comments. Open to other ways of doing it though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. If SkippedRecordEvent
is updated now to allow for default-y Exception behavior discussed above, then the rest feels like something we could suss out in future work. There aren't that many instances of raising these from code, so would be easy to ctrl + f
them and then think about what new exceptions might make sense to raise there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New commit pushed and issue created!
@@ -54,7 +55,8 @@ def get_optional_fields(self, xml: Tag) -> dict | None: | |||
if self.valid_content_types([content_type]): | |||
fields["content_type"] = [content_type] | |||
else: | |||
return None | |||
message = f'Record skipped based on content type: "{content_type}"' | |||
raise SkippedRecordEvent(source_record_id, message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments above about possible reworking of these custom exceptions.
Even if we don't go that route, I think perhaps the message
should be the first argument, and positional, and then source_record_id
be a named argument is optional.
And agreed on figuring this out however many commits it takes! |
* Shift param order for SkippedRecordEvent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Agree with the changes that were proposed by @ghukill. Still need to review the latest discussion/comments posted above!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for this intricate scaffolding work that is backwards/forwards compatible. Think it's establishing a nice pattern for more granular exception raising as we progress.
def __init__(self, message: str | None = None, source_record_id: str | None = None): | ||
super().__init__(message) | ||
self.source_record_id = source_record_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -54,7 +55,8 @@ def get_optional_fields(self, xml: Tag) -> dict | None: | |||
if self.valid_content_types([content_type]): | |||
fields["content_type"] = [content_type] | |||
else: | |||
return None | |||
message = f'Record skipped based on content type: "{content_type}"' | |||
raise SkippedRecordEvent(message, source_record_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I might suggest as we revist raising these in the future that we have a convention of positional message
and then named arguments (e.g. source_record_id=source_record_id
), but not blocking now as this definitely works too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea, too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 , I agree with that pattern as well going forward
* Update Transformer.__next__ method for more logical flow * Add message param to SkippedRecordEvent exception * Update call of SkippedRecordEvent in Datacite class
* Shift param order for SkippedRecordEvent
Purpose and background context
Adding more explicit flow control through exception handling in the
Transformer
class. Previously, records were skipped whenget_optional_fields
returnedNone
. Adding aSkippedRecordEvent
exception that can handle records that should be skipped for an invalidcontent_type
or records with significant structural issues that raise the risk of unknown exceptions later in the process. This is a minor update to the orchestration that should aid us with the field method refactor.As we proceed with the refactor in other transforms, use the
SkippedRecordEvent
exception wheneverget_optional_fields
returnsNone
. Logging can be added as necessary depending on what triggered the skip.How can a reviewer manually see the effects of these changes?
The
Datacite
class was updated to raise theSkippedRecordEvent
exception illustrating that tests such astest_zenodo_skips_records_with_invalid_content_types
still pass.Includes new or updated dependencies?
NO
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)