Base: twiceprecision: optimize mul12 #49568

nsajko · 2023-04-29T12:39:53Z

It is well known and obvious that the algorithm behind Base.mul12 (sometimes known as "Fast2Mult") doesn't require a "Fast2Sum" (known in the Julia codebase as Base.canonicalize2) call at the end, so remove it. This follows from the fact that IEEE-754 floating-point multiplication is required to be well-rounded, so Fast2Sum can't change the result.

Reference, for example, the beginning of https://doi.org/10.1145/3121432 by Joldes, Muller, Popescu.

Furthermore, Base.Math.two_mul already exists, so use it as a kernel function. This required adding a general fallback method for Base.Math.two_mul, required, for example, for BigFloat.

Also removed the iszero check that is now obviously not necessary.

oscardssmith · 2023-04-29T18:48:54Z

this is a good improvement. that said, it might make sense to do the deduplication now.

nsajko · 2023-05-02T19:52:16Z

ping

nsajko · 2023-05-02T20:03:24Z

Actually, mul12 being a very-low-level function, it shouldn't have either the iszero check or the isfinite check, that should be handled at a higher level. I'll try to improve that in a following PR.

oscardssmith · 2023-05-02T20:29:01Z

yeah I think I agree with you that it might be worth just deleting mul12 and only using Base.two_mul

It is well known and obvious that the algorithm behind `Base.mul12` (sometimes known as "Fast2Mult") doesn't require a "Fast2Sum" (known in the Julia codebase as `Base.canonicalize2`) call at the end, so remove it. This follows from the fact that IEEE-754 floating-point multiplication is required to be well-rounded, so Fast2Sum can't change the result. Reference, for example, the beginning of https://doi.org/10.1145/3121432 by Joldes, Muller, Popescu. Furthermore, `Base.Math.two_mul` already exists, so use it as a kernel function. This required adding a general fallback method for `Base.Math.two_mul`, required, for example, for `BigFloat`. Also removed the `iszero` check that is now obviously not necessary.

LilithHafner · 2023-05-06T20:40:07Z

This new function is different from the old in about 1% of cases but differs by at most one ulp for finite Float16s.

nsajko · 2023-05-06T21:15:43Z

Do you have an example, please?

oscardssmith · 2023-05-06T21:55:40Z

Also, for Float16, we should probably be doing this via Float32 mul anyway.

LilithHafner · 2023-05-06T23:37:38Z

An example with Float64 is mul12(-2.609923178752538e-225, 4.707396092301997e-78)
An example with Float32 is mul12(-3.219508f-16, 7.302329f-23)
An example with Float16 is mul12(Float16(0.0003116), Float16(0.836))

I don't have any reason to believe that a 1-ulp discrepancy is a problem, I just noticed it and wanted to point it out.

I was using Float16 because I can check all pairs of Float16 values explicitly while that is very hard to do for Float32 and impossible for Float64.

Code to generate examples

Here's code to generate all 37,671,208 examples in the Float16 case.

julia> function two_mul(x::T, y::T) where {T<:Number}
           xy = x*y
           xy, fma(x, y, -xy)
       end;

julia> function mul12_old(x::T, y::T) where {T<:AbstractFloat}
           h = x * y
           ifelse(iszero(h) | !isfinite(h), (h, h), Base.canonicalize2(h, fma(x, y, -h)))
       end;

julia> function mul12_new(x::T, y::T) where {T<:AbstractFloat}
           (h, l) = two_mul(x, y)
           ifelse(!isfinite(h), (h, h), (h, l))
       end;

julia> function get_errors()
           errors = NTuple{2, Float16}[]
           for i in 0x0:typemax(UInt16)
               for j in 0x0:typemax(UInt16)
                   a = reinterpret(Float16, i)
                   b = reinterpret(Float16, j)
                   isfinite(a) && isfinite(b) || continue
                   mul12_old(a, b) != mul12_new(a, b) && push!(errors, (a,b))
               end
           end
           errors
       end; @time errors = get_errors();
  9.056077 seconds (20 allocations: 161.329 MiB, 0.58% gc time)

Here's code to generate examples in the Float64 case

julia> using Base.Threads

julia> f() = while true
           a = reinterpret(Float64, rand(UInt64))
           isfinite(a) || continue
           x = rand(0x0:typemax(UInt64)-typemax(UInt32))
           for i in x:x+typemax(UInt32)
               b = reinterpret(Float64, i)
               if isfinite(b) && mul12_old(a, b) != mul12_new(a, b)
                   return "mul12($a,$b)"
               end
           end
       end; @sync for i in 1:Threads.nthreads() @spawn println(f()) end

oscardssmith · 2023-05-06T23:55:50Z

Ah, right. The difference is just the sign of the low piece when the high bit is exactly halfway between two floats. This isn't a problem.

nsajko · 2023-05-07T00:45:46Z

Interesting examples.

This seems to be a case of the error-free transformation failing to be error-free, as a result of limited exponent range. Compensated floating-point calculations always rely on the exponent range being large enough, the surprising thing here is that the calculation fails even though there was no overflow or underflow.

julia> function two_mul(x::T, y::T) where {T<:Number}
           xy = x*y
           xy, fma(x, y, -xy)
       end
two_mul (generic function with 1 method)

julia> (a, b) = (Float16(0.0003116), Float16(0.836))
(Float16(0.0003116), Float16(0.836))

julia> a * b
Float16(0.0002606)

julia> m = two_mul(a, b)
(Float16(0.0002606), Float16(-1.0e-7))

julia> sum(m)
Float16(0.0002604)

Even though two_mul is obviously the algorithm that is proven correct, it fails in this case. I guess this is the reason:

For Float16 I guess the precision, p, is 11 (mantissa length + 1), while the minimal exponent, e_min, is something like -14. So e_min + p - 1 would end up being the unexpectedly high value of -4, if I'm right. Conclusion: don't rely on error-free transformations more than the size of your exponent allows.

oscardssmith self-requested a review April 29, 2023 18:16

oscardssmith added the performance Must go faster label Apr 29, 2023

nsajko force-pushed the mul12 branch 2 times, most recently from 4c4b004 to 573cdc6 Compare April 29, 2023 21:24

oscardssmith approved these changes Apr 29, 2023

View reviewed changes

nsajko force-pushed the mul12 branch from 573cdc6 to 195bae7 Compare May 4, 2023 00:46

nsajko added a commit to nsajko/julia that referenced this pull request May 4, 2023

DO NOT MERGE, commit from PR JuliaLang#49568

f84bece

nsajko added a commit to nsajko/julia that referenced this pull request May 4, 2023

DO NOT MERGE, commit from PR JuliaLang#49568

1082f7b

KristofferC merged commit c90aa77 into JuliaLang:master May 4, 2023

nsajko deleted the mul12 branch May 4, 2023 11:05

LilithHafner mentioned this pull request May 6, 2023

Remove unnecessary Base. prefix #49668

Merged

LilithHafner added the maths Mathematical functions label Jun 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Base: twiceprecision: optimize mul12 #49568

Base: twiceprecision: optimize mul12 #49568

nsajko commented Apr 29, 2023 •

edited

Loading

oscardssmith commented Apr 29, 2023

nsajko commented May 2, 2023

nsajko commented May 2, 2023

oscardssmith commented May 2, 2023

LilithHafner commented May 6, 2023

nsajko commented May 6, 2023

oscardssmith commented May 6, 2023

LilithHafner commented May 6, 2023

oscardssmith commented May 6, 2023

nsajko commented May 7, 2023

Base: twiceprecision: optimize mul12 #49568

Base: twiceprecision: optimize mul12 #49568

Conversation

nsajko commented Apr 29, 2023 • edited Loading

oscardssmith commented Apr 29, 2023

nsajko commented May 2, 2023

nsajko commented May 2, 2023

oscardssmith commented May 2, 2023

LilithHafner commented May 6, 2023

nsajko commented May 6, 2023

oscardssmith commented May 6, 2023

LilithHafner commented May 6, 2023

oscardssmith commented May 6, 2023

nsajko commented May 7, 2023

nsajko commented Apr 29, 2023 •

edited

Loading