-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tokenizer] fix char offset #2137
Conversation
@andreabrduque FYI |
@@ -389,4 +389,29 @@ public void testTruncationAndPaddingForPairInputs() throws IOException { | |||
Assert.assertEquals(encoding.getIds().length, 8); | |||
} | |||
} | |||
|
|||
@Test | |||
public void testSpecialTokenHandling() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might not need this test, the test above this already has special characters.
Codecov ReportBase: 72.08% // Head: 71.40% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #2137 +/- ##
============================================
- Coverage 72.08% 71.40% -0.69%
- Complexity 5126 6292 +1166
============================================
Files 473 624 +151
Lines 21970 27847 +5877
Branches 2351 3004 +653
============================================
+ Hits 15838 19883 +4045
- Misses 4925 6503 +1578
- Partials 1207 1461 +254
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Uh thanks for this one. It was indeed returning UTF-8 Bytes instead of the right char spans :) |
Description
Fixed char offset issues.
fix #2112