-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add warnings in s2s for bad inputs #812
Conversation
"The tokenizer must be an instance of PreTrainedTokenizerFast " | ||
"or Tokenizer" | ||
) | ||
assert str(e.value) == ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert is not tested in the with
clause, it has to be done outside (we can put an assert False
in the with
clause and the test passes!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these tests were not passing before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were passing, it's just that this line was not taken into account. We can literally replace it with assert False
and the test would still pass. To make it count we have to put it outside of the with
clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool did not know this!
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #812 +/- ##
==========================================
+ Coverage 86.61% 86.66% +0.04%
==========================================
Files 196 196
Lines 15616 15673 +57
==========================================
+ Hits 13526 13583 +57
Misses 2090 2090 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the start of adding tests! I had one question about the data embs question
@@ -170,4 +169,13 @@ def watch( | |||
|
|||
generation_splits_set.add(Split[split]) | |||
|
|||
# A model of the correct type is required if we need to generate | |||
if generation_splits: | |||
assert isinstance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
"The tokenizer must be an instance of PreTrainedTokenizerFast " | ||
"or Tokenizer" | ||
) | ||
assert str(e.value) == ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting!
"The tokenizer must be an instance of PreTrainedTokenizerFast " | ||
"or Tokenizer" | ||
) | ||
assert str(e.value) == ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these tests were not passing before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! made a few small tweaks
Refactoring/Robustification https://app.shortcut.com/galileo/story/9394/add-warning-for-bad-inputs-response-template-and-data-emb-col
response_template
is not passed as a list of int (token ids), raise an explicit exceptiondata_emb_col
cannot be found in the df, first fallback to default cols text and target, and raise an exception if still not found (should not happen though)Current error if data_embs_col is not specified (or specified wrong)