Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding some imatrix tools #5302

Merged
merged 2 commits into from
Feb 4, 2024
Merged

Adding some imatrix tools #5302

merged 2 commits into from
Feb 4, 2024

Conversation

ikawrakow
Copy link
Contributor

I was playing around with various imatrix calculations and needed some additional functionality currently not available in the imatrix tool. The result is this PR, which adds the following functionality

  • --continue file_name If specified on the command line, the imatrix data in file_name will be loaded, and the subsequent calculation will accumulate on top of that.
  • --combine comma_separated_list_of_files If specified on the command line, the imatrix tool will load and combine the imatrix data in the list of provided files. The files are comma separated, so sorry, no commas (or spaces) allowed in file names. The data will then be saved (either in imatrix.dat or in the file specified via the -o option), and the program will terminate. No calculation is done when this option is specified.
  • --from-chunk N After tokenizing the supplied dataset, the first N token chunks will be discarded before proceeding with the calculation. For instance, if one has done a calculation with 100 chunks using some_training_data, and wants to continue from there, one can use ./imatrix -m some_model -f some_training_data --continue previous_imatrix --from-chunk 100.

I was playing around with the C4 datasets, which are huge, so it takes a very long time to tokenize (e.g., 1.5 minutes for c4-validation.00000-of-00008.json on my computer). I was bothered by that, so was tempted to add an option to tokenize just a portion of the data. But to be dome correctly one needs to deal with utf8, so did not implement for now.

@Nexesenex
Copy link
Contributor

Nexesenex commented Feb 3, 2024

Thank you, Ikawrakow, I really needed the first and third feature!

If I understand properly, --continue file_name loads an iMatrix, and --from-chunk N allows the continuation of the loaded iMatrix?

As for --combine comma_separated_list_of_files, could please you give some exemples about the use cases, and the methodology employed for combining the iMatrix data, because it's unclear to me.

@sorasoras
Copy link

Thank you, Ikawrakow, I really needed the first and third feature!

If I understand properly, --continue file_name loads an iMatrix, and --from-chunk N allows the continuation of the loaded iMatrix?

As for --combine comma_separated_list_of_files, could please you give some exemples about the use cases, and the methodology employed for combining the iMatrix data, because it's unclear to me.

If my guess is correct, I think the second feature can combine two different imatrix result from two process.

@ikawrakow
Copy link
Contributor Author

If I understand properly, --continue file_name loads an iMatrix, and --from-chunk N allows the continuation of the loaded iMatrix?

Yes. If you have an imatrix calculated from, e.g., the first 50 chunks of wiki.train.raw and stored in imatrix_1_50.dat, and you want to add N more chunks, you can use

./imatrix -m some_model -f wiki.train.raw --continue imatrix_1_50.dat --from-chunk 50 --chunks N

Or, you can store the result from the next chunks, and then use the `--combine option to combine the two results:

./imatrix -m some_model -f wiki.train.raw --from-chunk 50 --chunks N -o imatrix_50_100.dat
./imatrix --combine  imatrix_1_50.dat,imatrix_50_100.dat

As an example for using --combine (apart from the example above): suppose you have calculated an imatrix using an English training dataset, and another imatrix using a French training dataset. Let the results be in imatrix_en.dat and imatrix_fr.dat. You can use

./imatrix --combine imatrix_en.dat,imatrix_fr.dat -o imatrix_en_plus_fr.dat

to combine them and store the result in imatrix_en_plus_fr.dat

@Nexesenex
Copy link
Contributor

Nexesenex commented Feb 3, 2024

That's great!

I'd love to combine English and French iMatrix files indeed to improve a quant quality in both languages, especially with Miqu out!

But won't such a combination be a sort of "blur" between the values of the first and second iMatrix files, negating partly the benefit of one and another, this being amplified if we combine more than 2 files?

Also, should the iMatrix files combined preferably be of the same number of chunks and same ctx size?


And also, I just noticed, if I make the iMatrix on a Yi 34x2 MOE fp16, I get that 👍

[1]3.2549,[2]7.9056,[3]8.2650,[4]6.4178,
save_imatrix: stored collected data after 10 chunks in Y:\iMatrix\TomGrc_FusionNet_34Bx2_MoE_v0.1-b2054-Q8_0.iMatrix_Wiki_c32_ch500.dat
[5]7.2867,[6]7.2855,[7]7.2059,[8]7.1582,[9]7.3868,
save_imatrix: stored collected data after 20 chunks in Y:\iMatrix\TomGrc_FusionNet_34Bx2_MoE_v0.1-b2054-Q8_0.iMatrix_Wiki_c32_ch500.dat
[10]7.5491,[11]8.1775,[12]8.2683,[13]7.4605,[14]7.7463,
save_imatrix: stored collected data after 30 chunks in Y:\iMatrix\TomGrc_FusionNet_34Bx2_MoE_v0.1-b2054-Q8_0.iMatrix_Wiki_c32_ch500.dat

It seems that the count is not done properly, maybe affected by the number of models/experts of the MOE acting as a divisor.

[245]13.2935,[246]13.3249,[247]13.4073,[248]13.4650,[249]13.5338,
save_imatrix: stored collected data after 500 chunks in Y:\iMatrix\TomGrc_FusionNet_34Bx2_MoE_v0.1-b2054-Q8_0.iMatrix_Wiki_c32_ch500.dat

save_imatrix: stored collected data after 500 chunks in Y:\iMatrix\TomGrc_FusionNet_34Bx2_MoE_v0.1-b2054-Q8_0.iMatrix_Wiki_c32_ch500.dat.at_500
[250]13.5561,[251]13.6481,[252]13.6523,[253]13.6452,[254]13.6817,

But once the count of the autosave is reached, it keeps crunching as it should. Which means that the problematic count is likely part of the autosave feature.

@ikawrakow ikawrakow merged commit 5ed26e1 into master Feb 4, 2024
54 of 56 checks passed
@ikawrakow ikawrakow deleted the ik/imatrix_tools branch February 4, 2024 08:40
@sorasoras
Copy link

@ikawrakow
I was thinking you can add a feature that allow you to randomized Context length in imatrix process so you can basically add randomization into imatrix process.
When I merge result from same data with different context length from 16 128 512, it does improve my results.
This combine result really work for me.

@Nexesenex
Copy link
Contributor

Nexesenex commented Feb 5, 2024

I second @sorasoras. Having a random or simply multiple user choices, like 32 64 128 256 512 ctx size, multiplied by the number of chunks) to make the iMatrix could be interesting to test!

@ikawrakow
Copy link
Contributor Author

Can you both give some specific examples of what you did and how this improved your results? I did a quick try with a manually prepared mix of context lengths, and it didn't seem to help.

@sorasoras
Copy link

Can you both give some specific examples of what you did and how this improved your results? I did a quick try with a manually prepared mix of context lengths, and it didn't seem to help.

basically, I have a friend who finetune 1.8B 7B 13B qwen into a translation machine from Chinese to Japanese with same set of data.
I tried to use imatrix to get a better quantization of these model.
I start with 1.8B because you can do fast imatrix.
I first compare context 500 1024, C500 give me a better result by compare translation result from much larger one like 13B Q8.
and I tried Context 16 when I read
#5006 (comment)
but, there are some degradation compare from C16 and C500.
since you can combine two imatrix, why not combine it and give it a try.
it did give more and less better result.
so I tried to combine different context size results.
16,32,64,128,500,600,700,800,900,1000,1500 and so on.
it does give me a much better translation from readability and getting quite close to 13B Q8 that I test with.
That got me think of something I read.

https://www.microsoft.com/en-us/research/quarterly-brief/jan-2024-brief/articles/improving-reasoning-in-language-models-with-laser-layer-selective-rank-reduction/

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* imatrix: adding --combine and --continue-from

* imatrix: be able to start from a specific chunk

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* imatrix: adding --combine and --continue-from

* imatrix: be able to start from a specific chunk

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants