-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the tokenized result of sentencepiece java lib and python lib are different #999
Comments
@thinkzhou |
Yes, |
Change-Id: I19e77cf5a8282bea901434041806eb102549ec0f
Change-Id: I19e77cf5a8282bea901434041806eb102549ec0f
Change-Id: I19e77cf5a8282bea901434041806eb102549ec0f
@frankfliu thanks for the quick fix, could you publish the new version to maven central repository? i will test it. |
I build the snapshot version in local and pass my test, when will the released version be published to maven central repository? |
You can use our SNAPSHOT release for now. We expect to release 0.12.0 in mid of Jul |
commit 0092f8e Author: Aziz Zayed <azayed01@gmail.com> Date: Tue Jun 15 08:22:51 2021 -0700 Fixed truncated-normal bug commit a6ded8c Author: Aziz Zayed <azayed01@gmail.com> Date: Mon Jun 14 13:33:30 2021 -0700 [pytorch] Add BigGAN demo commit f145614 Merge: a8a1a9b ec8405b Author: Abd-El-Aziz Zayed <48853777+AzizZayed@users.noreply.github.com> Date: Fri Jun 11 20:45:34 2021 -0700 Merge branch 'deepjavalibrary:master' into master commit ec8405b Author: Abd-El-Aziz Zayed <48853777+AzizZayed@users.noreply.github.com> Date: Fri Jun 11 14:53:59 2021 -0700 [pytorch] Add oneHot operator (deepjavalibrary#1014) [tensoflow] Add truncated normal operation commit 50600fd Author: Frank Liu <frankfliu2000@gmail.com> Date: Fri Jun 11 14:53:43 2021 -0700 upgrade dependencies version (deepjavalibrary#1012) Change-Id: I709938f69f21096bc5cd29a24191f0f282dcbc97 commit 3379fd2 Author: Frank Liu <frankfliu2000@gmail.com> Date: Fri Jun 11 14:53:29 2021 -0700 [serving] Fix flaky test (deepjavalibrary#1013) Change-Id: I13b89e04516c59a3d28ecafd49f4f808630b22fb commit 23157fd Author: Frank Liu <frankfliu2000@gmail.com> Date: Thu Jun 10 16:31:03 2021 -0700 Enable spotbugs for java 11+ (deepjavalibrary#1010) Change-Id: I74effbf45492a5cf50e09ba8af0223d2b1bcb5a5 commit 4f38708 Author: Frank Liu <frankfliu2000@gmail.com> Date: Thu Jun 10 16:30:50 2021 -0700 Fix model zoo test typo (deepjavalibrary#1009) Change-Id: I7c0109c6e5fc0ece16288082fd830718f20ad489 commit a8a1a9b Merge: 77809f4 30b03f4 Author: Aziz Zayed <azayed01@gmail.com> Date: Thu Jun 10 15:16:05 2021 -0700 Merge Truncated-Normal branch commit 77809f4 Author: Frank Liu <frankfliu2000@gmail.com> Date: Thu Jun 10 14:07:43 2021 -0700 Make model zoo test weekly (deepjavalibrary#1004) Change-Id: I1c73df17cb077b9ce8905fcc2fc8bbb37b9688d8 commit 0aec8ca Author: Abd-El-Aziz Zayed <48853777+AzizZayed@users.noreply.github.com> Date: Thu Jun 10 12:46:16 2021 -0700 [tensoflow] Add truncated normal operation (deepjavalibrary#1005) commit 30b03f4 Author: Aziz Zayed <azayed01@gmail.com> Date: Wed Jun 9 01:40:33 2021 -0700 [tensoflow] Add truncated normal operation commit d8e7e1d Author: Frank Liu <frankfliu2000@gmail.com> Date: Wed Jun 9 07:55:15 2021 -0700 Fixes deepjavalibrary#999, hanlde UTF16 surrogate charactors properly. (deepjavalibrary#1003) Change-Id: I19e77cf5a8282bea901434041806eb102549ec0f commit b0fe73a Author: Frank Liu <frankfliu2000@gmail.com> Date: Tue Jun 8 17:56:19 2021 -0700 [pytorch] Update load model jupyter notebook (deepjavalibrary#1002) Change-Id: I1889aa93d2002e6ce02c740d2d1d3517bf586760 commit 8286930 Author: Frank Liu <frankfliu2000@gmail.com> Date: Tue Jun 8 15:29:27 2021 -0700 [tensorflow] fix optOption usage document (deepjavalibrary#1001) Change-Id: Ie044839cf082d63010a5c26d3f2f8833447919c6 commit a26f5b2 Author: Abd-El-Aziz Zayed <48853777+AzizZayed@users.noreply.github.com> Date: Tue Jun 8 15:29:10 2021 -0700 Updated PyTorch Docs (deepjavalibrary#1000) * Added auto softmax metadata for action_recognition * Update PyTorch Docs commit e6890f9 Author: Lanking <qingla@amazon.com> Date: Mon Jun 7 18:25:19 2021 -0700 upgrade xgboost (deepjavalibrary#993) commit a0dcf3a Author: Lanking <qingla@amazon.com> Date: Mon Jun 7 18:25:12 2021 -0700 bump up onnx runtime version (deepjavalibrary#992)
Description
I am using your java lib and the origin python lib to load xlm-robert-base model and tokenize sentences, find the result of java and python are different. It looks like the way java lib treat the emoji (eg. 👋) is incorrect, maybe this is a bug?
Expected Behavior
The tokenized result from java lib and python lib be the same
Error Message
No Error Message
How to Reproduce?
Java code:
get result:
[▁, ������������]
Python Code:
get result:
['▁', '👋', '👋']
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
Environment Info
Please run the command
./gradlew debugEnv
from the root directory of DJL (if necessary, clone DJL first). It will output information about your system, environment, and installation that can help us debug your issue. Paste the output of the command below:The text was updated successfully, but these errors were encountered: