-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The initial token is always empty. #367
Comments
Please review the issue reporting guidelines in #239 and provide a better description of the issue you are observing. |
I added more details based on your guideline, I hope that'll help |
The Token with ID 1 is a custom token called BOD (Begin Of Document) and is one of the two tokens which are required in the token vocabulary. The second is EOD (End Of Document) with ID 2. So to say, this is a normal behaviour. |
@PriNova I see, thanks for your answer I learned something today! |
You can make token 1 go away by commenting out in utils.cpp if (bos) {
// output.push_back(1);
} It's probably more correct with it there, but also doesn't seem to break anything if removed (if only submitting one whole document per session at least). As for the leading space, look at your initial tokens above of:
The space is inside the first token, so it is being printed. Technically if the first token starts with a space the output could skip over it when printing. |
The leading space is intentional and a result of Lines 232 to 233 in d5850c5
not not sure if we should just not print the first character (the space) or not. |
…-install-md-docs Ianscrivener macos install md docs
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Hello,
I noticed something when trying the chat with Bob is that I always get the first token as empty.
4103 -> ' Trans'
924 -> 'cript'
310 -> ' of'
263 -> ' a'
7928 -> ' dialog'
So the result is this:
There's this little space at the begining of the text. Maybe this alone can significantly impact the quality of the output, that's why I decided to post this issue.
I'm on a windows 10 using WSL to emulate the linux environnement (the main.exe is not as good as the linux main atm).
I'm using a file that is the result of all those manipulations:
Here's the .sh command (7B_CHAT_Bob.sh):
Everything is updated on this repository as I apply a git pull everytime I launch the powershell.
The text was updated successfully, but these errors were encountered: