-
-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct karatsuba multiplication of univariate polynomials for different degree polynomials #10255
Comments
comment:1
Same example with my patch. Pure karatsuba:
TODO: -Take a more careful look to the correctness of code. |
comment:2
the code is too naive for different sized polynomials. Even with my improvements I get this: Two degree 849 polynomials, no threshold: 274750 additions 87729 products One degree 849 polynomial and a 449 polynomial: 472812 additions 218649 products Experimental data
The shorter product is more expensive! I have a patch right now to solve this, have to clean it up though. |
comment:3
Further improvements for polynomials of of different sizes: if f is of degree m-1 and g is of degree n-1 with m<n, consider g as:
Compute the products fgi with karatsuba and reconstruct fg With this strategy, for polynomials of degree 849 and 449 I get: 214507 additions 39942 products Far better from previous method. Anyway this is not optimal and it is hard to messure the recursion overhead. I have taken this strategy from old versions of gmp documentation. Some timmings (Debian 64bits). All polyomials have been generated with random_element without further parameters. Pure karatsuba without threshold for better comparison with old code. With an appropriate threshold the timming whould slightly better. Note that for rigns like QQ[x] and GF(2)[x] sage uses faster special libraries (FLINT, NTL and the like). Degree f:1000,g:1000
degree f:849, g:449
degree 3, 5
TODO:
|
comment:4
All doctest passes and I am trying to make doctest independent of future changes of the default threshold. But I get the following difference that puzzles me.
|
comment:5
I have eliminated karatsuba_sumand karatsuba_diff so there are less function calls and less if branches in the coding (we know an can predict at each step the length of every list. I have also eliminated most slicing of lists so less objects are creating. The recursion does not operates over slices but over the orginial list plus some pointers to the slice we are working on. With this improvements the code is about 10% faster. I also think that improves readability. For number fields of degree 2 and 3 the code is about three times slower than pari. For number fields of degree >=6 sage seems faster than pari. For dense bivariate polynomials in QQ[x][y] the multiplication starts to be faster than in QQ[x,y] for degree 16, for dense polynomials of degree 100 QQ[x][y] is 45x faster than QQ[x,y]. Of course, for non-dense polynomials multivariate multiplication is much much faster. Some doctests added to compare karatsuba with generic multiplication and doctest for the non-commutative ring case. Still has the QQbar fail. |
comment:6
Hello, First of all, I want to say that it's great that you are looking at this part of the Sage library and are trying to improve it! From reading your ticket progress, it seems that perhaps you might benefit from some help with this. Have you considered advertising this problem on the developers' list at sage-devel? Perhaps someone else is interested in this, too. In any case, here are a couple of comments:
All that said, while this patch looks very interesting, I think there are a lot of things that need to be sorted out. By the way, I am not quite sure whether any one else is currently reading this ticket, but at least during the next week I should have a lot of time to look at this. Best wishes, Sebastian |
comment:7
Sebastian, Thanks for looking at this. I feel that the ticket is ready for review, but there is a doctest fail and I wanted to understand why. My impression is that the doctest should be changed, but I did not have enough time to dig it. Replying to @sagetrac-spancratz:
My primary interest was univariate polynomial rings over number fields of degree around 30-40. I was considering to use the specific class I have added in #8558, because, for number field base rings, the slowness comes really from adding number field elements by python 0. But then I realised that the karatsuba multiplication is not well enough implemented if both polynomials have different degree, so I changed the cython code improving here and there.
That is why the default threshold is zero. No classical multiplication at all. But I have added a ring-dependent parameter to change this value. Univariate polynomial rings have now a method set_karatsuba_threshold to deal with this issue. Rings in which addition is faster than multiplication also benefits from karatsuba, since the number of additions is also subquadratic in the degree of the polynomials.
By default, unless the user calls explicitelly _mul_karatsuba, mutiplication for inexact rings is the classical multiplication algorithm. I have not changed this. Also,if a class (ZZ[x], GF(5)[x], etc.) has its own multiplication code, my patch does not change this.
Thanks!, if you have any question on why I changed this or that or have further suggestions/improvements, do not hesitate to ask. Luis |
comment:8
One can actually a speedup of more than a factor 10 by using PARI: The following function converts a polynomial over a number field to a PARI object:
Now let's try your example:
Since PARI is written completely in C, it's obvious why one could expect such a speedup. Obviously, writing specialized code to deal with number field polynomials would even be better. |
comment:9
Obviously you did not try the computations with my patch, it is still much slower than PARI for small number fields (as I acknowledge in the ticket, PARI is around three times faster for degree two extensions):
If you take into account a naive transformation from sage data to pari and back, the total time is worse than the mul_karatsuba time. So if a specialiczed class would use PARI, this transformation has to be taken into account. This pari advantage disapears for bigger number fields
Note: for absolute number fields, I am used to set a karatsuba threshold in 8. |
comment:10
Hi Luis, I'm sorry for the delay in my reply. Also, thank you Jeroen for adding the explicit timings. Let me just go through a couple of things:
I don't really agree. It is certainly in a rather useful state, but it requires more people looking at it, using it, and providing feedback.
"If you take into account a naive transformation from sage data to pari and back, the total time is worse than the mul_karatsuba time. So if a specialiczed class would use PARI, this transformation has to be taken into account." is not true. In short, the code for polynomials over number fields should then be changed so that the internal representation simply was the PARI object --- no converting back and forth in between the computations whatsoever! I have looked at the currently only remaining failing test case in QQbar. Unfortunately, I am very puzzled by what could possibly cause this and don't have any help to offer right now. By the way, I am sorry if this might sound a little harsh in places. Of course, as said, I am still very happy to keep looking at this etc. Best wishes, Sebastian |
comment:11
Hi Sebastian, Replying to @sagetrac-spancratz:
This is my fault. Of course I have made much more testing (random polynomials over different rings and ranged degrees), but they are not reflected in the ticket. I will upload the scripts I used for testing.
Ok I agree, but this ticket is not about polynomials for number fields, but multiplication of Polynomial_generic_dense_field elements. A new class for polynomials over number fields would be good, but that is a different ticket. Also, the claim that PARI is better for small degrees is not so clear to me, at least without some nontrivial work. K.random_element creates polynomials with very small coefficients and no denominator, these polynomials are not the most common user-base case in my opinion. For such polynomials, it seems to me that memory management is the problem, creating and destroying lots of number field elements. If you take not so big coefficients and, specially, if your coefficients do have denominators, then PARI is not better due to flint integers in sage. Note: computations made on an old laptop.
Ok, this is a bad example, I needed some rings that produced genuine Polynomial_generic_dense_field appart from number fields. I also wanted to compare the algorithm with singular multivariate multiplication which I think it is considered fairly good. Note that ZZ[x][y] are currently arrays of mpz_t elements.
Answered above.
QQbar is an exact ring, so it will use karatsuba multiplication. But computations are done inexactly if they are safe. Since the karatsuba code for different degree polynomials have changed, then the inexact results for inexact rings have also changed. I think that this is the reason for the change of the doctest here.
Do not worry, certainly your comments and Jeroen's are really welcome. Luis |
comment:12
Hi Luis,
Well, which base ring is it that you are interested in? I don't really think there is all that much point in squeezing By the way, perhaps other developers might disagree with me. Of course, these are just words... Here's an example:
Here, the current code takes about 5s, whereas one can easily By the way, I believe a similar trick works for all polynomials In the opposite, when the degree of the number field is
Hopefully, my code snippet above convinces you that the current
I do not think this is true. ZZ['x']['y'] is different from Anyway, all that said, let's be more positive again and look
Best wishes, Sebastian |
tests over some rings |
comment:13
Attachment: test_karatsuba.py.gz Replying to @sagetrac-spancratz:
The fact that I am interested in polynomials over number fields does not invalidate the fact that the current generic karatsuba method must be changed because it is too inefficient. Do you think that mul_karatsuba is ok when it is slower for degree 1500 polynomials?
It does improve across all rings (or at least is not worse) because:
This last reason is very important. For instance, take a pair of polynomials over ZZ. One of degree twice the other. _mul_karatsuba is slower than _mul_generic independently of the degree. This is a bug in my opinion.
With patch:
I do not care if FLINT is lightning fast here. This example is just a symptom that _mul_karatsuba is not doing its job compared with _mul_generic. Moreover, multiplying a polynomial of degree 4000 and other of degree 1500 is much slower than multiplying a polynomial of degree 4000 and other of degree 2000. With the patch they cost around the same time, not something I am confortable with. More examples for more rings, taking two polynomials. One of degree 27 and the other 31. The first line is the time taking with my patch, second line with SAGE 4.6.
I do not think that this is a small improvement.
I'll do
Ok, but you are disregarding all other cases.
I have not really changed _mul_generic, only moved the code to avoid duplication of code. Calling list is already in the original code. list is also used for addition by the way. But I realize something, I have to take care of inline operations in the code. If base ring elements are inmutable (as I think they are expected to) this will just call non-inlined operations. If they are mutable, multiplication of polynomials will change the input polynomials, which is wrong. I will change this.
I will take a look, Best regards, Luis |
comment:14
For the number field specific case, see #10591 |
comment:15
Hi Luis, I am terribly sorry for not writing on this thread again any earlier; after returning to the UK following the Sage bug days in Seattle, I was busy catching up on some other things. Anyway...
Of course, it is great that (via this patch!) there might soon be some code which speeds up the generic implementation. But to be honest, personally I don't think the generic implementation is all that important; after all, it is only a fall-back option used if no-one has brought up enough energy to implement something more specific.
I feel this is a bit of a shame. Like rjf mentioned on sage-devel for many base rings you will be able to come up with something much, much faster than this small (by comparison!) improvement.
Of course, you are right, this is ridiculous. Implicit zero-padding (or even explicit, if you assume only references are stored) should be nearly for free, so a 4000 by 1500 product shouldn't cost any noticeably more than a 4000 by 2000 product. (Of course, ideally it should cost less!)
Obviously, we can agree to disagree here. In absolute terms, all of the above improvements are fantastic! In relative terms, looking at what would be possible just by writing base ring specific code, however, I am confident you could do better.
Obviously, and deliberately so. But please let me know which ring you actually, mathematically care about. I would not be surprised if we could come up with something noticeably faster! By the way, I am not sure whether anyone else is looking at this bit of code. While I can't promise how much spare time I will have in the next couple of weeks, I am happy to keep looking at this, run tests, give comments etc. Best wishes, Sebastian |
comment:16
Hi Sebastian, I think that all is said, I understand your point of view and appreciate the interest you are showing in this ticket. But I also think that this ticket is necessary and that current behavior for different sized polynomials is a bug. I am not currently working a lot on Sage, since I am overloaded with pre-exams and administrative duties, but I will try to finish with the cleaning of the code in the next two weeks. Bests |
Attachment: comparison_product_50_400.png Attachment: comparison_addition_50_400.png |
comment:17
just for the record, I attach two images showing some numbers. comparison_product_50_400.png Shows the number of products made to multiply a fixed polynomial of degree 49 and another polynomial of degree from 0 to 399. comparison_addition_50_400.png shows the number of additions performed by the same polynomials.
We can appreciate that for a big difference in the degrees of the polynomials, current _mul_karatsuba needs more additions and more products than _mul_generic. |
comment:18
Attachment: trac_karatsuba_improvements.patch.gz Apply: trac_karatsuba_improvements.2.patch |
Author: Luis Felipe Tabera Alonso |
comment:19
rebase against 4.7.1 |
This comment has been minimized.
This comment has been minimized.
Attachment: trac_karatsuba_improvements.2.patch.gz |
comment:21
Apply: trac_10255_karatsuba_improbement.patch |
This comment has been minimized.
This comment has been minimized.
rebase 5.13 |
comment:23
Attachment: trac_10255_karatsuba_improbement.patch.gz Apply: trac_10255_karatsuba_improbement.patch |
Branch: u/lftabera/ticket/10255 |
Branch pushed to git repo; I updated commit sha1. New commits:
|
Commit: |
comment:26
I for one believe that having a reasonable generic multiplication is important, no matter how much faster specialized implementations can be. The implementation in this ticket is a significant improvement over the existing one. The code looks fine to me and works well on all examples I tried. (There are still, e.g., a few list copies that would be nice to avoid, but this does not look easy to do without changes to the not-so-uniform interface of polynomials. In any case, this is not a reason to further delay this ticket IMO. Same goes for the other improvements I can think of.) A few minor points though, all related to documentation and tests:
New commits:
|
Reviewer: Marc Mezzarobba |
Changed branch from u/lftabera/ticket/10255 to u/mmezzarobba/10255-karatsuba |
comment:27
Marc, Thanks a lot for taking a look into this and working on improving the documentation. I agree with your changes. I have also changed the bogus non-fibnacci degree. Concerning the random tests, I have moved the comparison of the result with external libraries to rings/tests.py, but I have left the rest of the tests in place. Do you agree with this changes? |
Changed branch from u/mmezzarobba/10255-karatsuba to u/lftabera/ticket/10255 |
comment:28
Hi, I mostly agree with your changes, but I think randomized tests over other rings than Most importantly, your test used to compare If you are ok with my changes, please go on and set the ticket to positive_review. New commits:
|
Changed branch from u/lftabera/ticket/10255 to u/mmezzarobba/10255-karatsuba |
Changed author from Luis Felipe Tabera Alonso to Luis Felipe Tabera Alonso, Marc Mezzarobba |
comment:29
The code looks good to me. I wrote f._mul_karatsuba thinking on testing commutativity, but now I realize that assuming that the other library I am testing against is correct, it does not make much sense. |
Changed reviewer from Marc Mezzarobba to Marc Mezzarobba, Luis Felipe Tabera Alonso |
In the generic implementation of univariate polynomials over exact rings, plain karatsuba is used.
However, for different degree polynomials, the karatsuba code makes more products and additions than the classical multiplication code. Moreover, for equally sized polynomials, the degree in which karatsuba starts to be better than plain multiplication looks too high for me.
See attachments comparison_product_50_400.png and comparison_addition_50_400.png for the number of operations multiplying degree 50 and 400 polynomials, in yellow, the number of operations using _mul_generic, in red using current _mul_karatsuba as of 4.6.1 and in blue with the patch proposed.
See comment:13 for some benchmarks
Apply: trac_10255_karatsuba_improbement.patch
Component: basic arithmetic
Keywords: karatsuba, multiplication, polynomial
Author: Luis Felipe Tabera Alonso, Marc Mezzarobba
Branch/Commit: u/mmezzarobba/10255-karatsuba @
5977470
Reviewer: Marc Mezzarobba, Luis Felipe Tabera Alonso
Issue created by migration from https://trac.sagemath.org/ticket/10255
The text was updated successfully, but these errors were encountered: