Fix Windows KL divergence calculations #5273
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In #5166 I detailed how I was getting an unexpected issue with KL divergence segfaulting / not working.
I think I have identified the issue. Presumably related to how the file is written / read?
From: https://stackoverflow.com/questions/26993086/what-the-point-of-using-stdios-basebinary
If I make the modifications in this PR, it allows me to use KL divergence as intended on Windows (I am testing with the same model twice for debugging):
However, there is another issue with how perplexity tokenizes on Windows that is not fixed by this PR.
If you do not use -bf and instead use -f (as was recommended), it will tokenize differently, which leads to higher perplexity when compared to Linux / WSL:
WSL: 13.4018 +/- 0.59528
Windows: 13.9301 +/- 0.62122
(
-c 128
was used for both).This is not a small margin of error difference in ppl.
In terms of total tokens read, it's ~9400 tokens on Windows (without -bf), ~9800 on Linux. Setting -bf means they are equivalent.
Here is how the WSL KL divergence reads: