-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base: twiceprecision: optimize mul12 #49568
Conversation
this is a good improvement. that said, it might make sense to do the deduplication now. |
4c4b004
to
573cdc6
Compare
ping |
Actually, |
yeah I think I agree with you that it might be worth just deleting |
It is well known and obvious that the algorithm behind `Base.mul12` (sometimes known as "Fast2Mult") doesn't require a "Fast2Sum" (known in the Julia codebase as `Base.canonicalize2`) call at the end, so remove it. This follows from the fact that IEEE-754 floating-point multiplication is required to be well-rounded, so Fast2Sum can't change the result. Reference, for example, the beginning of https://doi.org/10.1145/3121432 by Joldes, Muller, Popescu. Furthermore, `Base.Math.two_mul` already exists, so use it as a kernel function. This required adding a general fallback method for `Base.Math.two_mul`, required, for example, for `BigFloat`. Also removed the `iszero` check that is now obviously not necessary.
This new function is different from the old in about 1% of cases but differs by at most one ulp for finite |
Do you have an example, please? |
Also, for Float16, we should probably be doing this via Float32 mul anyway. |
An example with I don't have any reason to believe that a 1-ulp discrepancy is a problem, I just noticed it and wanted to point it out. I was using Float16 because I can check all pairs of Code to generate examplesHere's code to generate all 37,671,208 examples in the Float16 case. julia> function two_mul(x::T, y::T) where {T<:Number}
xy = x*y
xy, fma(x, y, -xy)
end;
julia> function mul12_old(x::T, y::T) where {T<:AbstractFloat}
h = x * y
ifelse(iszero(h) | !isfinite(h), (h, h), Base.canonicalize2(h, fma(x, y, -h)))
end;
julia> function mul12_new(x::T, y::T) where {T<:AbstractFloat}
(h, l) = two_mul(x, y)
ifelse(!isfinite(h), (h, h), (h, l))
end;
julia> function get_errors()
errors = NTuple{2, Float16}[]
for i in 0x0:typemax(UInt16)
for j in 0x0:typemax(UInt16)
a = reinterpret(Float16, i)
b = reinterpret(Float16, j)
isfinite(a) && isfinite(b) || continue
mul12_old(a, b) != mul12_new(a, b) && push!(errors, (a,b))
end
end
errors
end; @time errors = get_errors();
9.056077 seconds (20 allocations: 161.329 MiB, 0.58% gc time) Here's code to generate examples in the Float64 case julia> using Base.Threads
julia> f() = while true
a = reinterpret(Float64, rand(UInt64))
isfinite(a) || continue
x = rand(0x0:typemax(UInt64)-typemax(UInt32))
for i in x:x+typemax(UInt32)
b = reinterpret(Float64, i)
if isfinite(b) && mul12_old(a, b) != mul12_new(a, b)
return "mul12($a,$b)"
end
end
end; @sync for i in 1:Threads.nthreads() @spawn println(f()) end |
Ah, right. The difference is just the sign of the low piece when the high bit is exactly halfway between two floats. This isn't a problem. |
Interesting examples. This seems to be a case of the error-free transformation failing to be error-free, as a result of limited exponent range. Compensated floating-point calculations always rely on the exponent range being large enough, the surprising thing here is that the calculation fails even though there was no overflow or underflow. julia> function two_mul(x::T, y::T) where {T<:Number}
xy = x*y
xy, fma(x, y, -xy)
end
two_mul (generic function with 1 method)
julia> (a, b) = (Float16(0.0003116), Float16(0.836))
(Float16(0.0003116), Float16(0.836))
julia> a * b
Float16(0.0002606)
julia> m = two_mul(a, b)
(Float16(0.0002606), Float16(-1.0e-7))
julia> sum(m)
Float16(0.0002604) Even though For |
It is well known and obvious that the algorithm behind
Base.mul12
(sometimes known as "Fast2Mult") doesn't require a "Fast2Sum" (known in the Julia codebase asBase.canonicalize2
) call at the end, so remove it. This follows from the fact that IEEE-754 floating-point multiplication is required to be well-rounded, so Fast2Sum can't change the result.Reference, for example, the beginning of https://doi.org/10.1145/3121432 by Joldes, Muller, Popescu.
Furthermore,
Base.Math.two_mul
already exists, so use it as a kernel function. This required adding a general fallback method forBase.Math.two_mul
, required, for example, forBigFloat
.Also removed the
iszero
check that is now obviously not necessary.