Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add warnings in s2s for bad inputs #812

Merged
merged 17 commits into from
Dec 4, 2023
Merged

Conversation

bogdan-galileo
Copy link
Contributor

@bogdan-galileo bogdan-galileo commented Nov 29, 2023

Refactoring/Robustification https://app.shortcut.com/galileo/story/9394/add-warning-for-bad-inputs-response-template-and-data-emb-col

  • Ifresponse_template is not passed as a list of int (token ids), raise an explicit exception
  • If data_emb_col cannot be found in the df, first fallback to default cols text and target, and raise an exception if still not found (should not happen though)

Current error if data_embs_col is not specified (or specified wrong)

Screenshot 2023-11-30 at 11 42 36 AM

@bogdan-galileo bogdan-galileo requested review from dcaustin33 and a team as code owners November 29, 2023 22:13
"The tokenizer must be an instance of PreTrainedTokenizerFast "
"or Tokenizer"
)
assert str(e.value) == (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is not tested in the with clause, it has to be done outside (we can put an assert False in the with clause and the test passes!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these tests were not passing before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were passing, it's just that this line was not taken into account. We can literally replace it with assert False and the test would still pass. To make it count we have to put it outside of the with clause

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2023-11-30 at 11 20 14 AM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool did not know this!

@bogdan-galileo bogdan-galileo marked this pull request as draft November 29, 2023 22:24
@codecov-commenter
Copy link

codecov-commenter commented Nov 30, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (699fb19) 86.61% compared to head (9dfc393) 86.66%.

Files Patch % Lines
dataquality/integrations/seq2seq/core.py 77.77% 2 Missing ⚠️
dataquality/utils/vaex.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #812      +/-   ##
==========================================
+ Coverage   86.61%   86.66%   +0.04%     
==========================================
  Files         196      196              
  Lines       15616    15673      +57     
==========================================
+ Hits        13526    13583      +57     
  Misses       2090     2090              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bogdan-galileo bogdan-galileo marked this pull request as ready for review November 30, 2023 15:37
Copy link
Contributor

@jonathangomesselman jonathangomesselman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the start of adding tests! I had one question about the data embs question

dataquality/integrations/seq2seq/core.py Show resolved Hide resolved
@@ -170,4 +169,13 @@ def watch(

generation_splits_set.add(Split[split])

# A model of the correct type is required if we need to generate
if generation_splits:
assert isinstance(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

"The tokenizer must be an instance of PreTrainedTokenizerFast "
"or Tokenizer"
)
assert str(e.value) == (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

"The tokenizer must be an instance of PreTrainedTokenizerFast "
"or Tokenizer"
)
assert str(e.value) == (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these tests were not passing before?

tests/loggers/test_seq2seq.py Outdated Show resolved Hide resolved
@bogdan-galileo bogdan-galileo marked this pull request as draft November 30, 2023 19:37
@bogdan-galileo bogdan-galileo marked this pull request as ready for review December 2, 2023 03:43
Copy link
Contributor

@elboy3 elboy3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! made a few small tweaks

@elboy3 elboy3 enabled auto-merge (squash) December 4, 2023 21:29
@elboy3 elboy3 merged commit addcb0b into main Dec 4, 2023
3 checks passed
@elboy3 elboy3 deleted the feat/add-warnings-s2s branch December 4, 2023 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants