[Tokenizer] fix char offset #2137

lanking520 · 2022-11-05T00:29:08Z

Description

Fixed char offset issues.
fix #2112

lanking520 · 2022-11-05T01:16:59Z

frankfliu · 2022-11-05T01:36:15Z

extensions/tokenizers/src/test/java/ai/djl/huggingface/tokenizers/HuggingFaceTokenizerTest.java

@@ -389,4 +389,29 @@ public void testTruncationAndPaddingForPairInputs() throws IOException {
            Assert.assertEquals(encoding.getIds().length, 8);
        }
    }
+
+    @Test
+    public void testSpecialTokenHandling() throws IOException {


We might not need this test, the test above this already has special characters.

codecov-commenter · 2022-11-05T02:12:26Z

Codecov Report

Base: 72.08% // Head: 71.40% // Decreases project coverage by -0.68% ⚠️

Coverage data is based on head (67cd1cf) compared to base (bb5073f).
Patch coverage: 71.54% of modified lines in pull request are covered.

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #2137      +/-   ##
============================================
- Coverage     72.08%   71.40%   -0.69%     
- Complexity     5126     6292    +1166     
============================================
  Files           473      624     +151     
  Lines         21970    27847    +5877     
  Branches       2351     3004     +653     
============================================
+ Hits          15838    19883    +4045     
- Misses         4925     6503    +1578     
- Partials       1207     1461     +254

Impacted Files	Coverage Δ
api/src/main/java/ai/djl/modality/cv/Image.java	`69.23% <ø> (-4.11%)`	⬇️
...rc/main/java/ai/djl/modality/cv/MultiBoxPrior.java	`76.00% <ø> (ø)`
...rc/main/java/ai/djl/modality/cv/output/Joints.java	`71.42% <ø> (ø)`
.../main/java/ai/djl/modality/cv/output/Landmark.java	`100.00% <ø> (ø)`
...main/java/ai/djl/modality/cv/output/Rectangle.java	`72.41% <0.00%> (ø)`
...i/djl/modality/cv/translator/BigGANTranslator.java	`21.42% <0.00%> (-5.24%)`	⬇️
.../modality/cv/translator/ImageFeatureExtractor.java	`0.00% <0.00%> (ø)`
.../ai/djl/modality/cv/translator/YoloTranslator.java	`27.77% <0.00%> (+18.95%)`	⬆️
...modality/cv/translator/wrapper/FileTranslator.java	`44.44% <ø> (ø)`
...y/cv/translator/wrapper/InputStreamTranslator.java	`44.44% <ø> (ø)`
... and 557 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

andreabrduque · 2022-11-09T09:30:51Z

@andreabrduque FYI

Uh thanks for this one. It was indeed returning UTF-8 Bytes instead of the right char spans :)

lanking520 requested review from zachgk and frankfliu as code owners November 5, 2022 00:29

frankfliu approved these changes Nov 5, 2022

View reviewed changes

fix char offset

67cd1cf

lanking520 force-pushed the pt branch from 494e1ac to 67cd1cf Compare November 5, 2022 01:13

frankfliu reviewed Nov 5, 2022

View reviewed changes

lanking520 merged commit b50d7fc into deepjavalibrary:master Nov 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tokenizer] fix char offset #2137

[Tokenizer] fix char offset #2137

lanking520 commented Nov 5, 2022 •

edited

Loading

lanking520 commented Nov 5, 2022

frankfliu Nov 5, 2022

codecov-commenter commented Nov 5, 2022

andreabrduque commented Nov 9, 2022

[Tokenizer] fix char offset #2137

[Tokenizer] fix char offset #2137

Conversation

lanking520 commented Nov 5, 2022 • edited Loading

Description

lanking520 commented Nov 5, 2022

frankfliu Nov 5, 2022

Choose a reason for hiding this comment

codecov-commenter commented Nov 5, 2022

Codecov Report

andreabrduque commented Nov 9, 2022

lanking520 commented Nov 5, 2022 •

edited

Loading