Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct karatsuba multiplication of univariate polynomials for different degree polynomials #10255

Closed
lftabera opened this issue Nov 12, 2010 · 46 comments

Comments

@lftabera
Copy link
Contributor

In the generic implementation of univariate polynomials over exact rings, plain karatsuba is used.

However, for different degree polynomials, the karatsuba code makes more products and additions than the classical multiplication code. Moreover, for equally sized polynomials, the degree in which karatsuba starts to be better than plain multiplication looks too high for me.

See attachments comparison_product_50_400.png and comparison_addition_50_400.png for the number of operations multiplying degree 50 and 400 polynomials, in yellow, the number of operations using _mul_generic, in red using current _mul_karatsuba as of 4.6.1 and in blue with the patch proposed.

sage: K=QQ[I][x]   
sage: f=K.random_element(1500)
sage: g=K.random_element(1500)
sage: %time _ = f._mul_generic(g)
CPU times: user 6.03 s, sys: 0.08 s, total: 6.11 s
Wall time: 6.43 s
sage: %time _ = f._mul_karatsuba(g)
CPU times: user 6.95 s, sys: 0.06 s, total: 7.01 s
Wall time: 7.76 s

See comment:13 for some benchmarks

Apply: trac_10255_karatsuba_improbement.patch

Component: basic arithmetic

Keywords: karatsuba, multiplication, polynomial

Author: Luis Felipe Tabera Alonso, Marc Mezzarobba

Branch/Commit: u/mmezzarobba/10255-karatsuba @ 5977470

Reviewer: Marc Mezzarobba, Luis Felipe Tabera Alonso

Issue created by migration from https://trac.sagemath.org/ticket/10255

@lftabera
Copy link
Contributor Author

comment:1

Same example with my patch. Pure karatsuba:

sage: K=QQ[I][x]   
sage: f=K.random_element(1500)
sage: g=K.random_element(1500)
sage: %time _ = f._mul_karatsuba(g)
CPU times: user 1.53 s, sys: 0.01 s, total: 1.54 s
Wall time: 1.73 s

TODO:

-Take a more careful look to the correctness of code.
-Check that the changes do not introduce performance penalties on a wider set of rings, degrees and size of polynomials.
-Introduce a user-definable threshold (with a sensible default for most common rings) for which karatsuba falls back to classic multiplication.

@lftabera
Copy link
Contributor Author

comment:2

the code is too naive for different sized polynomials. Even with my improvements I get this:

Two degree 849 polynomials, no threshold:

274750 additions

87729 products

One degree 849 polynomial and a 449 polynomial:

472812 additions

218649 products

Experimental data

sage: f=ZZ[x].random_element(849)
sage: g=ZZ[x].random_element(849)
sage: %timeit f._mul_karatsuba(g)
5 loops, best of 3: 95.9 ms per loop
sage: f=ZZ[x].random_element(849)
sage: g=ZZ[x].random_element(449)
sage: %timeit f._mul_karatsuba(g)
5 loops, best of 3: 218 ms per loop

The shorter product is more expensive!

I have a patch right now to solve this, have to clean it up though.

@lftabera
Copy link
Contributor Author

lftabera commented Dec 4, 2010

comment:3

Further improvements for polynomials of of different sizes:

if f is of degree m-1 and g is of degree n-1 with m<n, consider g as:

g0 + x^m*g1 + x^(2*m)g2 + ... + x^(q*m)*gq

Compute the products fgi with karatsuba and reconstruct fg

With this strategy, for polynomials of degree 849 and 449 I get:

214507 additions

39942 products

Far better from previous method. Anyway this is not optimal and it is hard to messure the recursion overhead. I have taken this strategy from old versions of gmp documentation.

Some timmings (Debian 64bits). All polyomials have been generated with random_element without further parameters. Pure karatsuba without threshold for better comparison with old code. With an appropriate threshold the timming whould slightly better.

Note that for rigns like QQ[x] and GF(2)[x] sage uses faster special libraries (FLINT, NTL and the like).

Degree f:1000,g:1000

QQ[x] 
old code: 828 ms
new code: 498 ms
GF(2)[x] 
old code: 284 ms
new code: 99.1 ms
QQ[sqrt(2)][x]
old code: 2.61 s
new code: 480 ms # 5 times faster
ZZ[x]['y']['z']['t']
old code: 86.95 s
new code: 71.10 s

degree f:849, g:449

QQ[x] 
old code: 988 ms
new code: 343 ms
GF(2)[x] 
old code: 148 ms
new code: 67.6 ms
QQ[sqrt(2)][x]
old code: 1.93 s
new code: 313 ms #6 times faster
ZZ[x]['y']['z']['t']
old code: 342.16 s
new code: 48.11 s # 7 times faster

degree 3, 5

QQ[x] 
old code: 135 µs
new code: 83.6 µs
GF(2)[x] 
old code: 137 µs
new code: 100 µs
QQ[sqrt(2)][x]
old code: 323 µs
new code: 86.4 µs
ZZ[x]['y']['z']['t']
old code: 19.5 ms
new code: 13.2 ms 

TODO:

  • Sensible default threshold.

  • Documentation cleanup. (almost done)

  • More testing for correctness rather than speed. This is a very basic feature, so we cannot allow correctness bugs here.

@lftabera
Copy link
Contributor Author

lftabera commented Dec 4, 2010

comment:4

All doctest passes and I am trying to make doctest independent of future changes of the default threshold. But I get the following difference that puzzles me.

sage -t -long -force_lib "devel/sage/sage/rings/qqbar.py"   
**********************************************************************
File "/home/rosamari/sage/devel/sage/sage/rings/qqbar.py", line 373:
    sage: sage_input(one, verify=True)
Expected:
    # Verified
    R.<x> = QQbar[]
    v1 = AA(2)
    v2 = QQbar(sqrt(v1))
    v3 = QQbar(3)
    v4 = sqrt(v3)
    v5 = v2*v4
    v6 = (1 - v2)*(1 - v4) - 1 - v5
    v7 = QQbar(sqrt(v1))
    v8 = sqrt(v3)
    si1 = v7*v8
    cp = AA.common_polynomial(x^2 + ((1 - v7)*(1 + v8) - 1 + si1)*x - si1)
    v9 = QQbar.polynomial_root(cp, RIF(-RR(1.7320508075688774), -RR(1.7320508075688772)))
    v10 = 1 - v9
    v11 = v6 + (v10 - 1)
    v12 = -1 - v4 - QQbar.polynomial_root(cp, RIF(-RR(1.7320508075688774), -RR(1.7320508075688772)))
    v13 = 1 + v12
    v14 = v10*(v6 + v5) - (v6 - v5*v9)
    si2 = v5*v9
    AA.polynomial_root(AA.common_polynomial(x^4 + (v11 + (v13 - 1))*x^3 + (v14 + (v13*v11 - v11))*x^2 + (v13*(v14 - si2) - (v14 - si2*v12))*x - si2*v12), RIF(RR(0.99999999999999989), RR(1.0000000000000002)))
Got:
    # Verified
    R.<x> = QQbar[]
    v1 = QQbar(3)
    v2 = sqrt(v1)
    v3 = AA(2)
    v4 = QQbar(sqrt(v3))
    v5 = sqrt(v1)
    si1 = v4*v5
    cp = AA.common_polynomial(x^2 + ((1 - v4)*(1 + v5) - 1 + si1)*x - si1)
    v6 = -1 - v2 - QQbar.polynomial_root(cp, RIF(-RR(1.7320508075688774), -RR(1.7320508075688772)))
    v7 = QQbar.polynomial_root(cp, RIF(-RR(1.7320508075688774), -RR(1.7320508075688772)))
    v8 = QQbar(sqrt(v3))
    v9 = v8*v2
    v10 = (1 - v8)*(1 - v2) - 1 - v9
    v11 = -v7 + v10
    v12 = v6*v11
    si2 = v7*v9
    v13 = (1 - v7)*(v10 + v9) - v10 + si2
    si3 = v6*si2
    AA.polynomial_root(AA.common_polynomial(x^4 + ((1 + v6)*(1 + v11) - 1 - v12)*x^3 + (v12 + v13)*x^2 + ((1 + v6)*(v13 - si2) - v13 + si3)*x - si3), RIF(RR(0.99999999999999989), RR(1.0000000000000002)))
**********************************************************************
1 items had failures:
   1 of 152 in __main__.example_0
***Test Failed*** 1 failures.

@lftabera
Copy link
Contributor Author

comment:5

I have eliminated karatsuba_sumand karatsuba_diff so there are less function calls and less if branches in the coding (we know an can predict at each step the length of every list.

I have also eliminated most slicing of lists so less objects are creating. The recursion does not operates over slices but over the orginial list plus some pointers to the slice we are working on.

With this improvements the code is about 10% faster. I also think that improves readability.

For number fields of degree 2 and 3 the code is about three times slower than pari. For number fields of degree >=6 sage seems faster than pari.

For dense bivariate polynomials in QQ[x][y] the multiplication starts to be faster than in QQ[x,y] for degree 16, for dense polynomials of degree 100 QQ[x][y] is 45x faster than QQ[x,y]. Of course, for non-dense polynomials multivariate multiplication is much much faster.

Some doctests added to compare karatsuba with generic multiplication and doctest for the non-commutative ring case.

Still has the QQbar fail.

@sagetrac-spancratz
Copy link
Mannequin

sagetrac-spancratz mannequin commented Jan 8, 2011

comment:6

Hello,

First of all, I want to say that it's great that you are looking at this part of the Sage library and are trying to improve it!

From reading your ticket progress, it seems that perhaps you might benefit from some help with this. Have you considered advertising this problem on the developers' list at sage-devel? Perhaps someone else is interested in this, too.

In any case, here are a couple of comments:

1)  What is your main motivation for wanting to improve this code?  If you want to achieve faster computations in polynomial rings over specific rings, QQ[i], you should most definitely either write some Cython code or even a separate C library.  You should *not* be tinkering with the generic Python multiplication code.  To keep with the example, if you are interested in QQ[i], then (just as a first idea) you should represent your polynomials as sums of two polynomials internally, one for the real part and one for the imaginary code.  This approach also applies to other number fields of small degree.

2)  Stating what you definitely already know:  Karatsuba multiplication trades additions and multiplications in the base ring.  Thus, the threshold highly depends on the relative speeds of addition and multiplication, and it will be very difficult for you to make a default choice that is useful in general.  Even worse, what about base rings where multiplication is faster than addition?

3)  I haven't looked at the details, but I believe you might simply not be able to apply this code to polynomial rings with inexact base rings.

All that said, while this patch looks very interesting, I think there are a lot of things that need to be sorted out. By the way, I am not quite sure whether any one else is currently reading this ticket, but at least during the next week I should have a lot of time to look at this.

Best wishes,

Sebastian

@lftabera
Copy link
Contributor Author

lftabera commented Jan 8, 2011

comment:7

Sebastian,

Thanks for looking at this. I feel that the ticket is ready for review, but there is a doctest fail and I wanted to understand why. My impression is that the doctest should be changed, but I did not have enough time to dig it.

Replying to @sagetrac-spancratz:

1)  What is your main motivation for wanting to improve this code?  If you want to achieve faster computations in polynomial rings over specific rings, QQ[i], you should most definitely either write some Cython code or even a separate C library.  You should *not* be tinkering with the generic Python multiplication code.  To keep with the example, if you are interested in QQ[i], then (just as a first idea) you should represent your polynomials as sums of two polynomials internally, one for the real part and one for the imaginary code.  This approach also applies to other number fields of small degree.

My primary interest was univariate polynomial rings over number fields of degree around 30-40. I was considering to use the specific class I have added in #8558, because, for number field base rings, the slowness comes really from adding number field elements by python 0. But then I realised that the karatsuba multiplication is not well enough implemented if both polynomials have different degree, so I changed the cython code improving here and there.

2)  Stating what you definitely already know:  Karatsuba multiplication trades additions and multiplications in the base ring.  Thus, the threshold highly depends on the relative speeds of addition and multiplication, and it will be very difficult for you to make a default choice that is useful in general.  Even worse, what about base rings where multiplication is faster than addition?

That is why the default threshold is zero. No classical multiplication at all. But I have added a ring-dependent parameter to change this value. Univariate polynomial rings have now a method set_karatsuba_threshold to deal with this issue. Rings in which addition is faster than multiplication also benefits from karatsuba, since the number of additions is also subquadratic in the degree of the polynomials.

3)  I haven't looked at the details, but I believe you might simply not be able to apply this code to polynomial rings with inexact base rings.

By default, unless the user calls explicitelly _mul_karatsuba, mutiplication for inexact rings is the classical multiplication algorithm. I have not changed this. Also,if a class (ZZ[x], GF(5)[x], etc.) has its own multiplication code, my patch does not change this.

All that said, while this patch looks very interesting, I think there are a lot of things that need to be sorted out. By the way, I am not quite sure whether any one else is currently reading this ticket, but at least during the next week I should have a lot of time to look at this.

Thanks!, if you have any question on why I changed this or that or have further suggestions/improvements, do not hesitate to ask.

Luis

@jdemeyer
Copy link

jdemeyer commented Jan 8, 2011

comment:8

One can actually a speedup of more than a factor 10 by using PARI:

The following function converts a polynomial over a number field to a PARI object:

def nfpol_to_pari(f):
    return pari([c._pari_('a') for c in f.list()]).Polrev()

Now let's try your example:

sage: K=QQ[I][x]   
sage: f=K.random_element(1500)
sage: g=K.random_element(1500)
sage: %time _ = f._mul_generic(g)
CPU times: user 3.57 s, sys: 0.00 s, total: 3.57 s
Wall time: 3.59 s
sage: %time _ = f._mul_karatsuba(g)
CPU times: user 5.06 s, sys: 0.05 s, total: 5.11 s
Wall time: 5.12 s
sage: fpari = nfpol_to_pari(f)
sage: gpari = nfpol_to_pari(g)
sage: %time _ = fpari*gpari
CPU times: user 0.26 s, sys: 0.00 s, total: 0.26 s
Wall time: 0.26 s

Since PARI is written completely in C, it's obvious why one could expect such a speedup. Obviously, writing specialized code to deal with number field polynomials would even be better.

@lftabera
Copy link
Contributor Author

lftabera commented Jan 9, 2011

comment:9

Obviously you did not try the computations with my patch, it is still much slower than PARI for small number fields (as I acknowledge in the ticket, PARI is around three times faster for degree two extensions):

sage: K=QQ[I][x]
sage: f=K.random_element(1500)
sage: g=K.random_element(1500)
sage: %time _ = f._mul_generic(g)
CPU times: user 5.94 s, sys: 0.02 s, total: 5.96 s
Wall time: 6.01 s
sage: %time _ = f._mul_karatsuba(g)
CPU times: user 1.31 s, sys: 0.00 s, total: 1.32 s
Wall time: 1.33 s
sage: K.set_karatsuba_threshold(8)
sage: %time _ = f._mul_karatsuba(g)  
CPU times: user 0.97 s, sys: 0.00 s, total: 0.97 s
Wall time: 1.00 s
sage: %time fpari = nfpol_to_pari(f)
CPU times: user 0.24 s, sys: 0.00 s, total: 0.24 s
Wall time: 0.25 s
sage: %time gpari = nfpol_to_pari(g)
CPU times: user 0.24 s, sys: 0.00 s, total: 0.24 s
Wall time: 0.25 s
sage: %time hpari = fpari*gpari
CPU times: user 0.42 s, sys: 0.00 s, total: 0.42 s
Wall time: 0.44 s
sage: %time h = K(hpari)       
CPU times: user 0.70 s, sys: 0.02 s, total: 0.72 s
Wall time: 0.73 s

If you take into account a naive transformation from sage data to pari and back, the total time is worse than the mul_karatsuba time. So if a specialiczed class would use PARI, this transformation has to be taken into account.

This pari advantage disapears for bigger number fields

sage: K=CyclotomicField(7)[x]
sage: f=K.random_element(1500)
sage: g=K.random_element(1500)
sage: %time _ = f._mul_karatsuba(g) 
CPU times: user 15.33 s, sys: 0.02 s, total: 15.35 s
Wall time: 15.44 s
sage: %time _ = f._mul_karatsuba(g,8)
CPU times: user 12.96 s, sys: 0.08 s, total: 13.03 s
Wall time: 13.10 s
sage: %time fpari = nfpol_to_pari(f)
CPU times: user 0.30 s, sys: 0.00 s, total: 0.30 s
Wall time: 0.32 s
sage: %time gpari = nfpol_to_pari(g) 
CPU times: user 0.30 s, sys: 0.00 s, total: 0.31 s
Wall time: 0.32 s
sage: %time hpari = fpari*gpari      
CPU times: user 24.26 s, sys: 0.02 s, total: 24.28 s
Wall time: 25.23 s
sage: %time h = K(hpari)             
CPU times: user 1.48 s, sys: 0.03 s, total: 1.50 s
Wall time: 1.52 s

Note: for absolute number fields, I am used to set a karatsuba threshold in 8.

@sagetrac-spancratz
Copy link
Mannequin

sagetrac-spancratz mannequin commented Jan 9, 2011

comment:10

Hi Luis,

I'm sorry for the delay in my reply. Also, thank you Jeroen for adding the explicit timings. Let me just go through a couple of things:

  • "I feel that the ticket is ready for review"

I don't really agree. It is certainly in a rather useful state, but it requires more people looking at it, using it, and providing feedback.

  • For one, as you say, this "is a very basic feature, so we cannot allow correctness bugs here." There should be a lot more tests. Rather than having a single test case for illustrative purposes (as is the case in many places in the Sage library), there should be more bulk tests here. You could have that in the Sage framework e.g. by creating a list with f._mul_karatsuba(g) - f*g for many random pairs (f,g), then remove duplicates from the list, and check that the list has length one and only contains 0.

  • Your code is still noticeably slower than PARI for the base ring being a number field of small degree. I guess strongly that this will be a much more typical use case than large degree number fields and as such this is not acceptable.

  • Another one of your examples is ZZ[x]['y']['z']['t']. I think this is a very bad example. I don't think one should ever expect a user to work with a multivariate polynomial ring in this way just to get some obscure speed-up. Of course, dense and sparse multivariate polynomial rings offer massively different performance characteristic. Perhaps there should be a dense option to MPolynomialRing, but that's not for this ticket. In any case, this example should not be relevant here. Finally, if you really care about dense multivariate polynomials over the integers, I am sure this can be done much, much faster using C or Cython and arrays of mpz_t's for the coefficients. The speed-up of this will be much, much larger than anything you get from nesting a bunch univariate polynomial rings in Python.

  • Finally, the statement

"If you take into account a naive transformation from sage data to pari and back, the total time is worse than the mul_karatsuba time. So if a specialiczed class would use PARI, this transformation has to be taken into account."

is not true. In short, the code for polynomials over number fields should then be changed so that the internal representation simply was the PARI object --- no converting back and forth in between the computations whatsoever!

I have looked at the currently only remaining failing test case in QQbar. Unfortunately, I am very puzzled by what could possibly cause this and don't have any help to offer right now.

By the way, I am sorry if this might sound a little harsh in places. Of course, as said, I am still very happy to keep looking at this etc.

Best wishes,

Sebastian

@lftabera
Copy link
Contributor Author

lftabera commented Jan 9, 2011

comment:11

Hi Sebastian,

Replying to @sagetrac-spancratz:

Hi Luis,

I'm sorry for the delay in my reply. Also, thank you Jeroen for adding the explicit timings. Let me just go through a couple of things:

  • "I feel that the ticket is ready for review"

I don't really agree. It is certainly in a rather useful state, but it requires more people looking at it, using it, and providing feedback.

  • For one, as you say, this "is a very basic feature, so we cannot allow correctness bugs here." There should be a lot more tests. Rather than having a single test case for illustrative purposes (as is the case in many places in the Sage library), there should be more bulk tests here. You could have that in the Sage framework e.g. by creating a list with f._mul_karatsuba(g) - f*g for many random pairs (f,g), then remove duplicates from the list, and check that the list has length one and only contains 0.

This is my fault. Of course I have made much more testing (random polynomials over different rings and ranged degrees), but they are not reflected in the ticket. I will upload the scripts I used for testing.

  • Your code is still noticeably slower than PARI for the base ring being a number field of small degree. I guess strongly that this will be a much more typical use case than large degree number fields and as such this is not acceptable.

Ok I agree, but this ticket is not about polynomials for number fields, but multiplication of Polynomial_generic_dense_field elements.

A new class for polynomials over number fields would be good, but that is a different ticket. Also, the claim that PARI is better for small degrees is not so clear to me, at least without some nontrivial work. K.random_element creates polynomials with very small coefficients and no denominator, these polynomials are not the most common user-base case in my opinion. For such polynomials, it seems to me that memory management is the problem, creating and destroying lots of number field elements. If you take not so big coefficients and, specially, if your coefficients do have denominators, then PARI is not better due to flint integers in sage.

Note: computations made on an old laptop.

sage: f=K.random_element(15,10**6) /  ZZ.random_element(10**6)  
sage: g=K.random_element(15,10**6) / ZZ.random_element(10**6)
sage: fpari = nfpol_to_pari(f)                              
sage: gpari = nfpol_to_pari(g)                                
sage: %timeit _=f._mul_karatsuba(g,8)
125 loops, best of 3: 4.56 ms per loop
sage: %timeit fpari*gpari            
25 loops, best of 3: 11 ms per loop
sage: f=K.random_element(1500,10**6)  /  ZZ.random_element(10**6)      
sage: g=K.random_element(1500,10**6)  /  ZZ.random_element(10**6)
sage: fpari = nfpol_to_pari(f)
sage: gpari = nfpol_to_pari(g)
sage: %time _=f._mul_karatsuba(g,8)  
CPU times: user 3.90 s, sys: 0.05 s, total: 3.95 s
Wall time: 9.30 s
sage: %time _=fpari*gpari            
CPU times: user 7.17 s, sys: 0.08 s, total: 7.25 s
Wall time: 18.11 s
  • Another one of your examples is ZZ[x]['y']['z']['t']. I think this is a very bad example. I don't think one should ever expect a user to work with a multivariate polynomial ring in this way just to get some obscure speed-up. Of course, dense and sparse multivariate polynomial rings offer massively different performance characteristic. Perhaps there should be a dense option to MPolynomialRing, but that's not for this ticket. In any case, this example should not be relevant here. Finally, if you really care about dense multivariate polynomials over the integers, I am sure this can be done much, much faster using C or Cython and arrays of mpz_t's for the coefficients. The speed-up of this will be much, much larger than anything you get from nesting a bunch univariate polynomial rings in Python.

Ok, this is a bad example, I needed some rings that produced genuine Polynomial_generic_dense_field appart from number fields. I also wanted to compare the algorithm with singular multivariate multiplication which I think it is considered fairly good. Note that ZZ[x][y] are currently arrays of mpz_t elements.

  • Finally, the statement

"If you take into account a naive transformation from sage data to pari and back, the total time is worse than the mul_karatsuba time. So if a specialiczed class would use PARI, this transformation has to be taken into account."

is not true. In short, the code for polynomials over number fields should then be changed so that the internal representation simply was the PARI object --- no converting back and forth in between the computations whatsoever!

Answered above.

I have looked at the currently only remaining failing test case in QQbar. Unfortunately, I am very puzzled by what could possibly cause this and don't have any help to offer right now.

QQbar is an exact ring, so it will use karatsuba multiplication. But computations are done inexactly if they are safe. Since the karatsuba code for different degree polynomials have changed, then the inexact results for inexact rings have also changed. I think that this is the reason for the change of the doctest here.

By the way, I am sorry if this might sound a little harsh in places. Of course, as said, I am still very happy to keep looking at this etc.

Do not worry, certainly your comments and Jeroen's are really welcome.

Luis

@sagetrac-spancratz
Copy link
Mannequin

sagetrac-spancratz mannequin commented Jan 10, 2011

comment:12

Hi Luis,

Ok, this is a bad example, I needed some rings that
produced genuine Polynomial_generic_dense_field appart
from number fields.

Ok I agree, but this ticket is not about polynomials for number
fields, but multiplication of Polynomial_generic_dense_field
elements.

Well, which base ring is it that you are interested in?

I don't really think there is all that much point in squeezing
a small improvement out of the generic case when often much
more is available, even more so when it is not clear that your
patch improves the run-time across all base rings.

By the way, perhaps other developers might disagree with me.
Perhaps it would be a good idea for you to ask about this on
sage-devel?

Of course, these are just words... Here's an example:

sage: R.<x> = QQ[I][]
sage: f = R.random_element(1500)
sage: g = R.random_element(1500)
sage: %time _ = f._mul_generic(g)
CPU times: user 4.39 s, sys: 0.01 s, total: 4.40 s
Wall time: 4.42 s
sage: %time _ = f._mul_karatsuba(g)
CPU times: user 5.63 s, sys: 0.04 s, total: 5.67 s
Wall time: 5.73 s
sage: S.<y> = QQ[]
sage: f0 = S([f[i][0] for i in range(f.degree() + 1)])
sage: f1 = S([f[i][1] for i in range(f.degree() + 1)])
sage: g0 = S([g[i][0] for i in range(g.degree() + 1)])
sage: g1 = S([g[i][1] for i in range(g.degree() + 1)])
sage: timeit('f0 * g0 - f1 * g1')
125 loops, best of 3: 2.76 ms per loop
sage: timeit('f1 * g0 + f0 * g1')
125 loops, best of 3: 2.99 ms per loop

Here, the current code takes about 5s, whereas one can easily
complete the computation in 6ms. This improvement is much
better than the factor of 7 that your patch provides.

By the way, I believe a similar trick works for all polynomials
over number fields in the case when the degree of the polynomial
is typically larger than the degree of the number field.

In the opposite, when the degree of the number field is
typically larger than the degree of the polynomials, I guess
one would want to take a different approach.

I also wanted to compare the algorithm with singular
multivariate multiplication which I think it is considered
fairly good.

Hopefully, my code snippet above convinces you that the current
implementation of QQ[I]['x'] isn't all that good.

Note that ZZ[x][y] are currently arrays of mpz_t elements.

I do not think this is true. ZZ['x']['y'] is different from
ZZ['x','y']. The latter is implemented as a sparse multivariate
polynomial ring. But the former is not implemented as a single
contiguous array of mpz_t's but as a list of coefficients,
where each coefficient (i.e. polynomials in ZZ['x']) is again
implemented (in FLINT, this time) as an array of mpz_t's.
That is to say, here you have nested lists, which is not was I
meant: I meant a single contiguous array of mpz_t's. I'm
not saying one should do that in the generic case, but if one
really cares about dense two-variate polynomials over ZZ, this
approach should be favourable.

Anyway, all that said, let's be more positive again and look
at your actual code:

  • In _mul_generic, I think you should move some of
    the isolated and non-edge cases from TESTS to EXAMPLES.
    TESTS should then contains more bulk tests as suggested
    earlier on this ticket.

  • In _mul_generic, your calls to list() will likely
    create new lists of objects. If they are guaranteed to
    be shallow copies of the elements, that is ok. This needs
    to be verified. If they are deep copies, however, this
    should be changed.

  • The code in do_karatsuba is not very readable. I think
    this can easily be improved by removing all the comments
    from the function body --- there are many more comment lines
    then actual code lines! --- into a brief descriptive
    documentation in the function header.

Best wishes,

Sebastian

@lftabera
Copy link
Contributor Author

tests over some rings

@lftabera
Copy link
Contributor Author

comment:13

Attachment: test_karatsuba.py.gz

Replying to @sagetrac-spancratz:

Hi Luis,

Ok, this is a bad example, I needed some rings that
produced genuine Polynomial_generic_dense_field appart
from number fields.

Ok I agree, but this ticket is not about polynomials for number
fields, but multiplication of Polynomial_generic_dense_field
elements.

Well, which base ring is it that you are interested in?

The fact that I am interested in polynomials over number fields does not invalidate the fact that the current generic karatsuba method must be changed because it is too inefficient. Do you think that mul_karatsuba is ok when it is slower for degree 1500 polynomials?

I don't really think there is all that much point in squeezing
a small improvement out of the generic case when often much
more is available, even more so when it is not clear that your
patch improves the run-time across all base rings.

It does improve across all rings (or at least is not worse) because:

  • It has less function calls, less slicing of lists and less if branches (low impact).

  • It does not perform plenty of unnecessary additions of 0 + ModuleElement (high impact over some base rings, like number fields or orders of number fields).

  • User has control over the karatsuba threshold, this can speed up things in some cases. By default the threshold is the same as the previous implementation. So this is not worse than the previous method (quite ring dependent).

  • The case of different sized polynomials is inefficient in the current implementation (high impact over every ring).

This last reason is very important. For instance, take a pair of polynomials over ZZ. One of degree twice the other. _mul_karatsuba is slower than _mul_generic independently of the degree. This is a bug in my opinion.

sage: K=ZZ[x]
sage: f=K.random_element(4000)
sage: g=K.random_element(2000)
sage: h=K.random_element(1500)
sage: %time _= f._mul_generic(g)
CPU times: user 0.90 s, sys: 0.00 s, total: 0.90 s
Wall time: 0.90 s
sage: %time _= f._mul_karatsuba(g)
CPU times: user 2.65 s, sys: 0.00 s, total: 2.66 s
Wall time: 2.66 s
sage: %time _= f._mul_karatsuba(h)
CPU times: user 3.41 s, sys: 0.00 s, total: 3.41 s
Wall time: 3.42 s

With patch:

sage: K=ZZ[x]
sage: f=K.random_element(4000)
sage: g=K.random_element(2000)
sage: h=K.random_element(1500)
sage: %time _=f._mul_generic(g)
CPU times: user 0.90 s, sys: 0.00 s, total: 0.90 s
Wall time: 0.91 s
sage: %time _=f._mul_karatsuba(g)
CPU times: user 0.24 s, sys: 0.00 s, total: 0.24 s
Wall time: 0.24 s
sage: %time _=f._mul_karatsuba(h)
CPU times: user 0.25 s, sys: 0.00 s, total: 0.25 s
Wall time: 0.25 s

I do not care if FLINT is lightning fast here. This example is just a symptom that _mul_karatsuba is not doing its job compared with _mul_generic. Moreover, multiplying a polynomial of degree 4000 and other of degree 1500 is much slower than multiplying a polynomial of degree 4000 and other of degree 2000. With the patch they cost around the same time, not something I am confortable with.

More examples for more rings, taking two polynomials. One of degree 27 and the other 31. The first line is the time taking with my patch, second line with SAGE 4.6.

ZZ[x]
625 loops, best of 3: 195 µs per loop
625 loops, best of 3: 435 µs per loop

ZZ[I][x]
625 loops, best of 3: 1.05 ms per loop
5 loops, best of 3: 219 ms per loop

ZZ[I,sqrt(2)][x]
25 loops, best of 3: 10.3 ms per loop
5 loops, best of 3: 317 ms per loop

QQ[x]
625 loops, best of 3: 1 ms per loop
125 loops, best of 3: 1.93 ms per loop

Integers(6)[x]
625 loops, best of 3: 263 µs per loop
625 loops, best of 3: 890 µs per loop

ZZ[x]['y']
625 loops, best of 3: 263 µs per loop
125 loops, best of 3: 5.06 ms per loop

QuaternionAlgebra(1,1)[x]
125 loops, best of 3: 2.36 ms per loop
125 loops, best of 3: 4.43 ms per loop

GF(2)[x].fraction_field()[y]
5 loops, best of 3: 44 ms per loop
5 loops, best of 3: 70.6 ms per loop

QQ[x].fraction_field()['y']
5 loops, best of 3: 229 ms per loop
5 loops, best of 3: 290 ms per loop

GF(5)[x].quotient_ring(x^5-x)[x]
5 loops, best of 3: 53.5 ms per loop
5 loops, best of 3: 80.6 ms per loop

SR[x]
625 loops, best of 3: 1.26 ms per loop
125 loops, best of 3: 3.95 ms per loop

QQ['x,y,z']['t']
25 loops, best of 3: 25.7 ms per loop
25 loops, best of 3: 34.1 ms per loop

GF(7^2,'a')[x] #Here the time is virtually the same due to the cost of creating the lists
5 loops, best of 3: 68.8 ms per loop
5 loops, best of 3: 71.3 ms per loop
  • x200 faster for orders of number fields
  • x30 for relative orders
  • x2 faster for quaternion algebras
  • x2 for the symbolic ring

I do not think that this is a small improvement.

By the way, perhaps other developers might disagree with me.
Perhaps it would be a good idea for you to ask about this on
sage-devel?

I'll do

Of course, these are just words... Here's an example:

sage: R.<x> = QQ[I][]
sage: f = R.random_element(1500)
sage: g = R.random_element(1500)
sage: %time _ = f._mul_generic(g)
CPU times: user 4.39 s, sys: 0.01 s, total: 4.40 s
Wall time: 4.42 s
sage: %time _ = f._mul_karatsuba(g)
CPU times: user 5.63 s, sys: 0.04 s, total: 5.67 s
Wall time: 5.73 s
sage: S.<y> = QQ[]
sage: f0 = S([f[i][0] for i in range(f.degree() + 1)])
sage: f1 = S([f[i][1] for i in range(f.degree() + 1)])
sage: g0 = S([g[i][0] for i in range(g.degree() + 1)])
sage: g1 = S([g[i][1] for i in range(g.degree() + 1)])
sage: timeit('f0 * g0 - f1 * g1')
125 loops, best of 3: 2.76 ms per loop
sage: timeit('f1 * g0 + f0 * g1')
125 loops, best of 3: 2.99 ms per loop

Here, the current code takes about 5s, whereas one can easily
complete the computation in 6ms. This improvement is much
better than the factor of 7 that your patch provides.

Ok, but you are disregarding all other cases.

Anyway, all that said, let's be more positive again and look
at your actual code:

  • In _mul_generic, I think you should move some of
    the isolated and non-edge cases from TESTS to EXAMPLES.
    TESTS should then contains more bulk tests as suggested
    earlier on this ticket.

  • In _mul_generic, your calls to list() will likely
    create new lists of objects. If they are guaranteed to
    be shallow copies of the elements, that is ok. This needs
    to be verified. If they are deep copies, however, this
    should be changed.

I have not really changed _mul_generic, only moved the code to avoid duplication of code. Calling list is already in the original code. list is also used for addition by the way. But I realize something, I have to take care of inline operations in the code. If base ring elements are inmutable (as I think they are expected to) this will just call non-inlined operations. If they are mutable, multiplication of polynomials will change the input polynomials, which is wrong. I will change this.

  • The code in do_karatsuba is not very readable. I think
    this can easily be improved by removing all the comments
    from the function body --- there are many more comment lines
    then actual code lines! --- into a brief descriptive
    documentation in the function header.

I will take a look,

Best regards,

Luis

@lftabera
Copy link
Contributor Author

comment:14

For the number field specific case, see #10591

@sagetrac-spancratz
Copy link
Mannequin

sagetrac-spancratz mannequin commented Jan 19, 2011

comment:15

Hi Luis,

I am terribly sorry for not writing on this thread again any earlier; after returning to the UK following the Sage bug days in Seattle, I was busy catching up on some other things. Anyway...

Ok I agree, but this ticket is not about polynomials for number
fields, but multiplication of Polynomial_generic_dense_field
elements.

Well, which base ring is it that you are interested in?

The fact that I am interested in polynomials over number fields does not invalidate the fact that the current generic karatsuba method must be changed because it is too inefficient. Do you think that mul_karatsuba is ok when it is slower for degree 1500 polynomials?

Of course, it is great that (via this patch!) there might soon be some code which speeds up the generic implementation. But to be honest, personally I don't think the generic implementation is all that important; after all, it is only a fall-back option used if no-one has brought up enough energy to implement something more specific.

I do not care if FLINT is lightning fast here.

I feel this is a bit of a shame. Like rjf mentioned on sage-devel for many base rings you will be able to come up with something much, much faster than this small (by comparison!) improvement.

This example is just a symptom that _mul_karatsuba is not doing its job compared with _mul_generic. Moreover, multiplying a polynomial of degree 4000 and other of degree 1500 is much slower than multiplying a polynomial of degree 4000 and other of degree 2000.

Of course, you are right, this is ridiculous. Implicit zero-padding (or even explicit, if you assume only references are stored) should be nearly for free, so a 4000 by 1500 product shouldn't cost any noticeably more than a 4000 by 2000 product. (Of course, ideally it should cost less!)

  • x200 faster for orders of number fields
  • x30 for relative orders
  • x2 faster for quaternion algebras
  • x2 for the symbolic ring

I do not think that this is a small improvement.

Obviously, we can agree to disagree here. In absolute terms, all of the above improvements are fantastic! In relative terms, looking at what would be possible just by writing base ring specific code, however, I am confident you could do better.

Of course, these are just words... Here's an example:

sage: R.<x> = QQ[I][]
sage: f = R.random_element(1500)
sage: g = R.random_element(1500)
sage: %time _ = f._mul_generic(g)
CPU times: user 4.39 s, sys: 0.01 s, total: 4.40 s
Wall time: 4.42 s
sage: %time _ = f._mul_karatsuba(g)
CPU times: user 5.63 s, sys: 0.04 s, total: 5.67 s
Wall time: 5.73 s
sage: S.<y> = QQ[]
sage: f0 = S([f[i][0] for i in range(f.degree() + 1)])
sage: f1 = S([f[i][1] for i in range(f.degree() + 1)])
sage: g0 = S([g[i][0] for i in range(g.degree() + 1)])
sage: g1 = S([g[i][1] for i in range(g.degree() + 1)])
sage: timeit('f0 * g0 - f1 * g1')
125 loops, best of 3: 2.76 ms per loop
sage: timeit('f1 * g0 + f0 * g1')
125 loops, best of 3: 2.99 ms per loop

Ok, but you are disregarding all other cases.

Obviously, and deliberately so. But please let me know which ring you actually, mathematically care about. I would not be surprised if we could come up with something noticeably faster!

By the way, I am not sure whether anyone else is looking at this bit of code. While I can't promise how much spare time I will have in the next couple of weeks, I am happy to keep looking at this, run tests, give comments etc.

Best wishes,

Sebastian

@lftabera
Copy link
Contributor Author

comment:16

Hi Sebastian,

I think that all is said, I understand your point of view and appreciate the interest you are showing in this ticket. But I also think that this ticket is necessary and that current behavior for different sized polynomials is a bug.

I am not currently working a lot on Sage, since I am overloaded with pre-exams and administrative duties, but I will try to finish with the cleaning of the code in the next two weeks.

Bests

@lftabera
Copy link
Contributor Author

Attachment: comparison_product_50_400.png

Attachment: comparison_addition_50_400.png

@lftabera
Copy link
Contributor Author

comment:17

just for the record, I attach two images showing some numbers.

comparison_product_50_400.png Shows the number of products made to multiply a fixed polynomial of degree 49 and another polynomial of degree from 0 to 399.

comparison_addition_50_400.png shows the number of additions performed by the same polynomials.

  • In yellow using algorithm _mul_generic

  • In red using current _mul_karatsuba as of 4.6.1

  • in blue with this patch

We can appreciate that for a big difference in the degrees of the polynomials, current _mul_karatsuba needs more additions and more products than _mul_generic.

@lftabera
Copy link
Contributor Author

comment:18

Attachment: trac_karatsuba_improvements.patch.gz

Apply: trac_karatsuba_improvements.2.patch

@lftabera
Copy link
Contributor Author

Author: Luis Felipe Tabera Alonso

@lftabera
Copy link
Contributor Author

comment:19

rebase against 4.7.1

@lftabera

This comment has been minimized.

@lftabera
Copy link
Contributor Author

Attachment: trac_karatsuba_improvements.2.patch.gz

@lftabera lftabera changed the title improve karatsuba multiplication of univariate polynomials Correct karatsuba multiplication of univariate polynomials for different degree polynomials Feb 23, 2013
@lftabera
Copy link
Contributor Author

lftabera commented Mar 3, 2013

comment:21

Apply: trac_10255_karatsuba_improbement.patch

@lftabera

This comment has been minimized.

@jdemeyer jdemeyer modified the milestones: sage-5.11, sage-5.12 Aug 13, 2013
@lftabera
Copy link
Contributor Author

rebase 5.13

@lftabera
Copy link
Contributor Author

comment:23

Attachment: trac_10255_karatsuba_improbement.patch.gz

Apply: trac_10255_karatsuba_improbement.patch

@lftabera
Copy link
Contributor Author

Branch: u/lftabera/ticket/10255

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 20, 2013

Branch pushed to git repo; I updated commit sha1. New commits:

4d08ddf#10255 Improve karatsuba multiplication.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 20, 2013

Commit: 4d08ddf

@mezzarobba
Copy link
Member

comment:26

I for one believe that having a reasonable generic multiplication is important, no matter how much faster specialized implementations can be.

The implementation in this ticket is a significant improvement over the existing one. The code looks fine to me and works well on all examples I tried.

(There are still, e.g., a few list copies that would be nice to avoid, but this does not look easy to do without changes to the not-so-uniform interface of polynomials. In any case, this is not a reason to further delay this ticket IMO. Same goes for the other improvements I can think of.)

A few minor points though, all related to documentation and tests:

  • Shouldn't the test in the docstring of do_karatsuba_different_size read

     sage: K = ZZ[x]
     sage: f = K.random_element(21) # instead of 28
     sage: g = K.random_element(34)
     sage: f*g - f._mul_karatsuba(g,0)
     0
    

    in accordance with the comment about Fibonacci numbers?

  • The tests that call random_element() should probably use sage.misc.random_testing.

  • I pushed a commit with minor changes to some docstrings and comments (no code change). Could you please have a look and tell me if you agree with my changes?


New commits:

f6cd17eImproved Karatsuba: cosmetic changes to docstrings and comments

@mezzarobba
Copy link
Member

Reviewer: Marc Mezzarobba

@mezzarobba
Copy link
Member

Changed commit from 4d08ddf to f6cd17e

@mezzarobba
Copy link
Member

Changed branch from u/lftabera/ticket/10255 to u/mmezzarobba/10255-karatsuba

@lftabera
Copy link
Contributor Author

Changed commit from f6cd17e to 15dabc3

@lftabera
Copy link
Contributor Author

comment:27

Marc,

Thanks a lot for taking a look into this and working on improving the documentation. I agree with your changes. I have also changed the bogus non-fibnacci degree.

Concerning the random tests, I have moved the comparison of the result with external libraries to rings/tests.py, but I have left the rest of the tests in place. Do you agree with this changes?

@lftabera
Copy link
Contributor Author

Changed branch from u/mmezzarobba/10255-karatsuba to u/lftabera/ticket/10255

@mezzarobba
Copy link
Member

comment:28

Hi,

I mostly agree with your changes, but I think randomized tests over other rings than ZZ would not be a luxury. So I modified test_karatsuba_multiplication to take the base ring as input. Can you have a look at my patch?

Most importantly, your test used to compare g*f to f._mul_karatsuba(g, threshold). I changed g*f to (essentially) f*g in order to handle noncommutative base rings. But perhaps you wrote g*f deliberately? I also made the degree bound passed to R.random_element random. And I changed ZZ['x'] to PolynomialRing(base_ring, 'x') because some rings currently don't support the former syntax.

If you are ok with my changes, please go on and set the ticket to positive_review.


New commits:

5977470Improved Karatsuba: more thorough tests

@mezzarobba
Copy link
Member

Changed commit from 15dabc3 to 5977470

@mezzarobba
Copy link
Member

Changed branch from u/lftabera/ticket/10255 to u/mmezzarobba/10255-karatsuba

@lftabera
Copy link
Contributor Author

Changed author from Luis Felipe Tabera Alonso to Luis Felipe Tabera Alonso, Marc Mezzarobba

@lftabera
Copy link
Contributor Author

comment:29

The code looks good to me. I wrote f._mul_karatsuba thinking on testing commutativity, but now I realize that assuming that the other library I am testing against is correct, it does not make much sense.

@lftabera
Copy link
Contributor Author

Changed reviewer from Marc Mezzarobba to Marc Mezzarobba, Luis Felipe Tabera Alonso

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants