Fix Windows KL divergence calculations #5273

kalomaze · 2024-02-02T06:11:51Z

In #5166 I detailed how I was getting an unexpected issue with KL divergence segfaulting / not working.

I think I have identified the issue. Presumably related to how the file is written / read?

From: https://stackoverflow.com/questions/26993086/what-the-point-of-using-stdios-basebinary

"under Unix, there is no distinction; both are identical. Under Windows, '\n' internally will be mapped to the two character sequence CR, LF (0x0D, 0x0A) externally, and 0x1A will be interpreted as an end of file when reading"

If I make the modifications in this PR, it allows me to use KL divergence as intended on Windows (I am testing with the same model twice for debugging):

kl_divergence: 0.40 seconds per pass - ETA 0.12 minutes

chunk        PPL          ln(PPL(Q)/PPL(base))          KL-Divergence           Same top
   1        8.1417      -0.00002 ┬▒    0.00000      -0.00001 ┬▒    0.00000    1.00000 ┬▒ 0.00000
   2       11.9650      -0.00002 ┬▒    0.00000      -0.00001 ┬▒    0.00000    1.00000 ┬▒ 0.00000
   3       12.0453      -0.00017 ┬▒    0.00016      -0.00000 ┬▒    0.00000    0.99869 ┬▒ 0.00131
   4       11.8829      -0.00013 ┬▒    0.00012      -0.00001 ┬▒    0.00000    0.99902 ┬▒ 0.00098
   5       12.6807      -0.00011 ┬▒    0.00010      -0.00001 ┬▒    0.00000    0.99922 ┬▒ 0.00078
   6       11.1073      -0.00009 ┬▒    0.00008      -0.00001 ┬▒    0.00000    0.99935 ┬▒ 0.00065
   7       11.8471      -0.00008 ┬▒    0.00007      -0.00001 ┬▒    0.00000    0.99944 ┬▒ 0.00056
   8       11.5785      -0.00008 ┬▒    0.00006      -0.00001 ┬▒    0.00000    0.99951 ┬▒ 0.00049
   9       10.6265      -0.00007 ┬▒    0.00005      -0.00001 ┬▒    0.00000    0.99956 ┬▒ 0.00044
  10       11.1318      -0.00006 ┬▒    0.00005      -0.00001 ┬▒    0.00000    0.99961 ┬▒ 0.00039
  11       10.6147      -0.00006 ┬▒    0.00004      -0.00001 ┬▒    0.00000    0.99964 ┬▒ 0.00036
  12       10.1512      -0.00005 ┬▒    0.00004      -0.00001 ┬▒    0.00000    0.99967 ┬▒ 0.00033
  13       10.0229      -0.00005 ┬▒    0.00004      -0.00001 ┬▒    0.00000    0.99970 ┬▒ 0.00030
  14       10.1798      -0.00006 ┬▒    0.00010      -0.00000 ┬▒    0.00000    0.99916 ┬▒ 0.00049
  15       10.2974      -0.00005 ┬▒    0.00009      -0.00000 ┬▒    0.00000    0.99922 ┬▒ 0.00045
  16       10.5658      -0.00002 ┬▒    0.00012       0.00001 ┬▒    0.00000    0.99926 ┬▒ 0.00042
  17       10.4636      -0.00002 ┬▒    0.00011       0.00001 ┬▒    0.00000    0.99931 ┬▒ 0.00040
  18       10.3634      -0.00008 ┬▒    0.00012       0.00001 ┬▒    0.00000    0.99891 ┬▒ 0.00049
  19       10.3194      -0.00005 ┬▒    0.00012       0.00001 ┬▒    0.00000    0.99856 ┬▒ 0.00055

===== KL-divergence statistics
Average:   0.000013 ┬▒  0.000001
Median :  -0.000007
Maximum:   0.002893
KLD_99 :   0.000360
KLD_95 :   0.000142
KLD_90 :   0.000065
Minimum:  -0.000045
KLD_01 :  -0.000033
KLD_05 :  -0.000026
KLD_10 :  -0.000023

However, there is another issue with how perplexity tokenizes on Windows that is not fixed by this PR.

If you do not use -bf and instead use -f (as was recommended), it will tokenize differently, which leads to higher perplexity when compared to Linux / WSL:

WSL: 13.4018 +/- 0.59528
Windows: 13.9301 +/- 0.62122

(-c 128 was used for both).

This is not a small margin of error difference in ppl.

In terms of total tokens read, it's ~9400 tokens on Windows (without -bf), ~9800 on Linux. Setting -bf means they are equivalent.

Here is how the WSL KL divergence reads:

chunk        PPL          ln(PPL(Q)/PPL(base))          KL-Divergence           Same top
   1        8.1417      -0.00002 ±    0.00000      -0.00001 ±    0.00000    1.00000 ± 0.00000
   2       11.9650       0.00007 ±    0.00044       0.00003 ±    0.00000    0.99608 ± 0.00277
   3       12.0453       0.00007 ±    0.00030       0.00002 ±    0.00000    0.99739 ± 0.00185
   4       11.8819      -0.00004 ±    0.00030       0.00003 ±    0.00000    0.99706 ± 0.00170
   5       12.6798      -0.00003 ±    0.00024       0.00002 ±    0.00000    0.99765 ± 0.00136
   6       11.1075       0.00005 ±    0.00023       0.00002 ±    0.00000    0.99739 ± 0.00131
   7       11.8473       0.00004 ±    0.00020       0.00002 ±    0.00000    0.99776 ± 0.00112
   8       11.5787       0.00003 ±    0.00017       0.00001 ±    0.00000    0.99804 ± 0.00098
   9       10.6266       0.00003 ±    0.00015       0.00001 ±    0.00000    0.99826 ± 0.00087
  10       11.1319       0.00002 ±    0.00014       0.00001 ±    0.00000    0.99843 ± 0.00078
  11       10.6148       0.00002 ±    0.00013       0.00001 ±    0.00000    0.99857 ± 0.00071
  12       10.1513       0.00002 ±    0.00011       0.00001 ±    0.00000    0.99869 ± 0.00065
  13       10.0230       0.00001 ±    0.00011       0.00000 ±    0.00000    0.99879 ± 0.00060
  14       10.1799       0.00001 ±    0.00010       0.00000 ±    0.00000    0.99888 ± 0.00056
  15       10.2975       0.00001 ±    0.00009       0.00000 ±    0.00000    0.99895 ± 0.00052
  16       10.5658       0.00001 ±    0.00009       0.00000 ±    0.00000    0.99902 ± 0.00049
  17       10.4637       0.00001 ±    0.00008       0.00000 ±    0.00000    0.99908 ± 0.00046
  18       10.3641       0.00001 ±    0.00008      -0.00000 ±    0.00000    0.99913 ± 0.00044
  19       10.3198       0.00000 ±    0.00007      -0.00000 ±    0.00000    0.99917 ± 0.00041

===== KL-divergence statistics
Average:  -0.000001 ±  0.000001
Median :  -0.000009
Maximum:   0.001216
KLD_99 :   0.000225
KLD_95 :   0.000060
KLD_90 :   0.000000
Minimum:  -0.000045
KLD_01 :  -0.000033
KLD_05 :  -0.000026
KLD_10 :  -0.000023

Nexesenex · 2024-02-07T15:11:53Z

Is it possible that your changes derailled a bit the Hellaswag (with the .txt file, not the .bin) computation?

Example of command used :
perplexity -m X:\text-generation-webui\models\MiquMaid-v1-70B.q3_k_m.gguf -f hellaswag_val_full.txt --hellaswag --hellaswag-tasks 1000 -ngl 100 -b 512 -mg 0 -ts 5,2

I used to have 88-90 Hellaswag scores on 70b models (on 400 or 1000 iterations), and now, it dropped to 83-84 (same model, same quant).

Fix Windows KL divergence calculations

8a68286

ikawrakow approved these changes Feb 2, 2024

View reviewed changes

ggerganov merged commit 1912211 into ggerganov:master Feb 2, 2024
53 checks passed

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024

perplexity : fix KL divergence calculations on Windows (ggerganov#5273)

552c54d

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

perplexity : fix KL divergence calculations on Windows (ggerganov#5273)

c52fd9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Windows KL divergence calculations #5273

Fix Windows KL divergence calculations #5273

kalomaze commented Feb 2, 2024 •

edited

Loading

Nexesenex commented Feb 7, 2024

Fix Windows KL divergence calculations #5273

Fix Windows KL divergence calculations #5273

Conversation

kalomaze commented Feb 2, 2024 • edited Loading

Nexesenex commented Feb 7, 2024

kalomaze commented Feb 2, 2024 •

edited

Loading