-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More fall backs to parent working and AbstractArray2
#120
Conversation
Codecov Report
@@ Coverage Diff @@
## master #120 +/- ##
==========================================
+ Coverage 85.86% 85.90% +0.04%
==========================================
Files 8 10 +2
Lines 1528 1618 +90
==========================================
+ Hits 1312 1390 +78
- Misses 216 228 +12
Continue to review full report at Codecov.
|
I'm trying to improve coverage still, but having problems with |
The reinterpretarray case is particularly important, as it's essential for efficiently going from an array of structs to a struct of arrays. E.g. if you have an array of structs, you can do a reshaped-reinterpret julia> using VectorizationBase
julia> colors = [(R = rand(Float32), G = rand(Float32), B = rand(Float32)) for _ in 1:200];
julia> colormat = reinterpret(reshape, Float32, colors)
3×200 reinterpret(reshape, Float32, ::Vector{NamedTuple{(:R, :G, :B), Tuple{Float32, Float32, Float32}}}) with eltype Float32:
0.533352 0.27818 0.185949 0.178599 0.942 0.52459 0.433129 0.973397 0.119951 0.320551 … 0.266196 0.555827 0.211765 0.547991 0.423878 0.614966 0.292452 0.650146 0.412988
0.103244 0.26153 0.471125 0.162027 0.629155 0.0509689 0.717035 0.853053 0.811974 0.199296 0.544779 0.968282 0.213631 0.969433 0.33376 0.722512 0.287656 0.40756 0.319273
0.335378 0.912518 0.8327 0.896587 0.424958 0.964567 0.346463 0.883965 0.82452 0.807947 0.984961 0.765952 0.0311273 0.902841 0.635449 0.420161 0.0574269 0.0528734 0.335193 And then using the help?> VectorizationBase.Unroll
Unroll{AU,F,N,AV,W,M,X}(i::I)
• AU: Unrolled axis
• F: Factor, step size per unroll. If AU == AV, F == W means successive loads. 1 would mean offset by 1, e.g. x{1:8], x[2:9], and x[3:10].
• N: How many times is it unrolled
• AV: Vectorized axis # 0 means not vectorized, some sort of reduction
• W: vector width
• M: bitmask indicating whether each factor is masked
• X: stride between loads of vectors along axis AV.
• i::I - index
julia> vload(stridedpointer(colormat), VectorizationBase.Unroll{1,1,3,2,16}((1,1)))
3 x Vec{16, Float32}
Vec{16, Float32}<0.5333525f0, 0.27818024f0, 0.18594885f0, 0.17859852f0, 0.94200015f0, 0.52458966f0, 0.4331286f0, 0.97339725f0, 0.11995125f0, 0.32055116f0, 0.44533563f0, 0.83760893f0, 0.045946956f0, 0.59923065f0, 0.99825144f0, 0.013801932f0>
Vec{16, Float32}<0.103244424f0, 0.26153028f0, 0.47112453f0, 0.16202688f0, 0.62915516f0, 0.050968885f0, 0.7170353f0, 0.85305285f0, 0.81197417f0, 0.19929564f0, 0.7722609f0, 0.026382446f0, 0.9953768f0, 0.03576827f0, 0.44099808f0, 0.33057177f0>
Vec{16, Float32}<0.33537757f0, 0.9125179f0, 0.8326999f0, 0.89658654f0, 0.42495847f0, 0.96456707f0, 0.34646308f0, 0.88396525f0, 0.82452047f0, 0.8079473f0, 0.67117286f0, 0.53255033f0, 0.27501118f0, 0.8738842f0, 0.6906295f0, 0.44903874f0> You can load vectors of colors; the three vectors above are R, G, and B. The reason the static strides are essential for making this efficient is that they let you avoid slow # julia> @code_native syntax=:intel debuginfo=:none vload(stridedpointer(colormat), VectorizationBase.Unroll{1,1,3,2,16}((1,1)))
.text
mov rax, qword ptr [rdx]
mov rcx, qword ptr [rdx + 8]
lea rcx, [rcx + 2*rcx]
shl rcx, 2
lea rax, [rcx + 4*rax]
mov rcx, qword ptr [rsi]
vmovups zmm0, zmmword ptr [rcx + rax - 16]
vmovups zmm1, zmmword ptr [rcx + rax + 48]
vmovups zmm2, zmmword ptr [rcx + rax + 112]
movabs rax, offset .rodata
vmovaps zmm3, zmmword ptr [rax]
mov rax, rdi
vpermi2ps zmm3, zmm0, zmm1
movabs rcx, 139882633240320
vmovaps zmm4, zmmword ptr [rcx]
vpermi2ps zmm4, zmm3, zmm2
movabs rcx, 139882633240384
vmovaps zmm3, zmmword ptr [rcx]
movabs rcx, 139882633240448
vmovaps zmm5, zmmword ptr [rcx]
vpermi2ps zmm3, zmm1, zmm0
vpermi2ps zmm5, zmm3, zmm2
movabs rcx, offset .rodata.cst32
vbroadcasti64x4 zmm3, ymmword ptr [rcx] # zmm3 = mem[0,1,2,3,0,1,2,3]
vpermd zmm2, zmm3, zmm2
movabs rcx, 139882633240512
vmovaps zmm3, zmmword ptr [rcx]
vpermi2ps zmm3, zmm0, zmm1
mov cl, -32
kmovd k1, ecx
vmovapd zmm3 {k1}, zmm2
vmovaps zmmword ptr [rdi], zmm4
vmovaps zmmword ptr [rdi + 64], zmm5
vmovapd zmmword ptr [rdi + 128], zmm3
vzeroupper
ret
nop word ptr cs:[rax + rax] Note the performance gain static sizing gives us: julia> @benchmark vload(stridedpointer($colormat), VectorizationBase.Unroll{1,1,3,2,16}((1,1)))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 2.460 ns (0.00% GC)
median time: 2.955 ns (0.00% GC)
mean time: 2.984 ns (0.00% GC)
maximum time: 20.861 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> colormatrix = copy(colormat); typeof(colormatrix)
Matrix{Float32} (alias for Array{Float32, 2})
julia> @benchmark vload(stridedpointer($colormatrix), VectorizationBase.Unroll{1,1,3,2,16}((1,1)))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 10.010 ns (0.00% GC)
median time: 10.189 ns (0.00% GC)
mean time: 10.417 ns (0.00% GC)
maximum time: 35.157 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999 The copy/ |
I think using abstract types as an easy opt-in is fine. because we can't make base methods use our traits. Splitting things into more files is fine too, but I'll have to take a deeper look at this PR later. |
src/ArrayInterface.jl
Outdated
Base.axes(A::AbstractArray2, dim) = ArrayInterface.axes(A, dim) | ||
|
||
Base.strides(A::AbstractArray2) = ArrayInterface.strides(A) | ||
Base.strides(A::AbstractArray2, dim) = ArrayInterface.strides(A, dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the size and stride methods convert to Int
to avoid potential type instabilities/problems in generic code unaware of ArrayInterface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These only exist to pass things forward to the ArrayInterface implementation. But if you look at the actually implementation...
axes(a, dim) = axes(a, to_dims(a, dim))
function axes(a::A, dim::Integer) where {A}
if parent_type(A) <: A
return Base.axes(a, Int(dim))
else
return axes(parent(a), to_parent_dims(A, dim))
end
end
...StaticInt
is never passed to a method in Base but persists internally, preventing premature conversion of StaticInt
to an Int
and relying on constant propagation for inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I define a statically sized AbstractArray2
using the ArtayInterface, then Base.size
will suddenly return StaticInt
s as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it made sense because StaticInt
is part of the public interface and we want to propagate static information, avoiding a dependence on constant propagation when dealing with deeply nested types. I'm sure we will run into some situations where methods explicitly require Int
, but I also assumed that the end goal was for StaticInt
to become standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that users might assume that size
returns an NTuple{N,Int}
, and end up inadvertently writing code like myprod
that doesn't handle heterogenous tuples well:
julia> using ArrayInterface: StaticInt
julia> sz = (StaticInt(4), 5);
julia> @code_warntype prod(sz)
MethodInstance for prod(::Tuple{StaticInt{4}, Int64})
from prod(x::Tuple{Any, Vararg{Any}}) in Base at tuple.jl:480
Arguments
#self#::Core.Const(prod)
x::Tuple{StaticInt{4}, Int64}
Body::Int64
1 ─ %1 = Core._apply_iterate(Base.iterate, Base.:*, x)::Int64
└── return %1
julia> function myprod(x)
p = 1
for a ∈ x
p *= a
end
p
end
myprod (generic function with 1 method)
julia> @code_warntype myprod(sz)
MethodInstance for myprod(::Tuple{StaticInt{4}, Int64})
from myprod(x) in Main at REPL[10]:1
Arguments
#self#::Core.Const(myprod)
x::Tuple{StaticInt{4}, Int64}
Locals
@_3::Union{Nothing, Tuple{Union{StaticInt{4}, Int64}, Int64}}
p::Int64
a::Union{StaticInt{4}, Int64}
Body::Int64
1 ─ (p = 1)
│ %2 = x::Tuple{StaticInt{4}, Int64}
│ (@_3 = Base.iterate(%2))
│ %4 = (@_3::Core.Const((static(4), 2)) === nothing)::Core.Const(false)
│ %5 = Base.not_int(%4)::Core.Const(true)
└── goto #4 if not %5
2 ┄ %7 = @_3::Union{Tuple{StaticInt{4}, Int64}, Tuple{Int64, Int64}}
│ (a = Core.getfield(%7, 1))
│ %9 = Core.getfield(%7, 2)::Int64
│ (p = p * a)
│ (@_3 = Base.iterate(%2, %9))
│ %12 = (@_3 === nothing)::Bool
│ %13 = Base.not_int(%12)::Bool
└── goto #4 if not %13
3 ─ goto #2
4 ┄ return p
julia> @btime myprod($(Ref(sz))[])
14.331 ns (1 allocation: 16 bytes)
20
julia> @btime prod($(Ref(sz))[])
0.989 ns (0 allocations: 0 bytes)
20
I don't want to accidentally cause bad performance in other libraries. I have no idea how prevalent code like this is though, and would hope authors would fix it.
I also figure that if code is set up to take advantage of static size information, it could probably also take the step to use ArrayInterface.size
& co instead. E.g., many of my libraries have using ArrayInterface: size, strides
.
But if code is written in ignorance of these, is it more likely that they'd benefit from the forced constant prop, or that they'd be hurt by the need for constant prop to ensure type stability?
As is, I have definitions like here:
@inline Base.size(A::AbstractStrideArray) = map(Int, size(A))
@inline Base.strides(A::AbstractStrideArray) = map(Int, strides(A))
@inline create_axis(s, ::Zero) = CloseOpen(s)
@inline create_axis(s, ::One) = One():s
@inline create_axis(s, o) = CloseOpen(s, s+o)
@inline ArrayInterface.axes(A::AbstractStrideArray) = map(create_axis, size(A), offsets(A))
@inline Base.axes(A::AbstractStrideArray) = axes(A)
@inline ArrayInterface.offsets(A::PtrArray) = getfield(getfield(A, :ptr), :offsets)
@inline ArrayInterface.static_length(A::AbstractStrideArray) = prod(size(A))
# type stable, because index known at compile time
@inline type_stable_select(t::NTuple, ::StaticInt{N}) where {N} = t[N]
@inline type_stable_select(t::Tuple, ::StaticInt{N}) where {N} = t[N]
# type stable, because tuple is homogenous
@inline type_stable_select(t::NTuple, i::Integer) = t[i]
# make the tuple homogenous before indexing
@inline type_stable_select(t::Tuple, i::Integer) = map(Int, t)[i]
@inline function ArrayInterface.axes(A::AbstractStrideVector, i::Integer)
if i == 1
o = type_stable_select(offsets(A), i)
s = type_stable_select(size(A), i)
return create_axis(s, o)
else
return One():1
end
end
@inline function ArrayInterface.axes(A::AbstractStrideArray, i::Integer)
o = type_stable_select(offsets(A), i)
s = type_stable_select(size(A), i)
create_axis(s, o)
end
@inline Base.axes(A::AbstractStrideArray, i::Integer) = axes(A, i)
Meant to try and avoid placing too many performance foot-guns about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the examples Chris. I think it nicely illustrates the problems with returning StaticInt
in the wild.
It will undoubtedly cause problems introducing StaticInt
into the public space, although StaticArrays already does this and returns SOneTo
for its axes. Perhaps another way of approaching this is asking ourselves if we really do plan for something like StaticInt
to be passed around in the future. If we do then at what point do we start pushing people to make that adjustment? Having to move back and forth between static and dynamic space so that users don't have to learn about using things like NTuple{N}
vs Tuple{Vararg{Any,N}}
seems like a mistake to me, but so does country music and I'm not going to take that away from people either.
@timholy, you had some valuable input when we made StaticInt
. Any opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it so that strides
and size
will return Int
types. Returning statically sized axes seems to work for StaticArrays so I left that in. I think this also makes it so that new subtypes of AbstractArray2
can just define axes
and axes_types
and ArrayInterface.size
and ArrayInterface.strides
can catch any static info.
Implemented generated version of Base.size_to_strides so that we don't have to rely on base definitions. Makes it so we don't rely on Base.stride to reconstruct strides. Does NOT change any test results
Some dimension specific methods have been using fxn(A, dim) = fxn(A)[dim]. This doesn't work b/c technically there are infinite dimensions past ndims(A), but they are of size 1. So the general pattern for a static dim method is - put in type domain `fxn(x, dim) = fxn(typeof(x), dim)` - ensure proper dims becomes `StaticInt` or `Int` `fxn(::Type{T}, dim) where {T} = fxn(T, to_dims(T, dim))` - execute function ``` function fxn(::Type{T}, dim::Integer) where {T} if ndims(T) < dim return ndim_plus_default else return fxn(T)[dim] end end ``` When executing on instances it's a similar pattern but we don't need to worry about puting everything in the type domain.
Realized that this method is basically checks if strides can still be derived after indexing, which is what we need for defines_strides(::SubArray).
@chriselrod , I think this PR is in a pretty good state for a serious review now. If people ultimately decide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
end | ||
end | ||
|
||
function _reshaped_axis_type(::Type{R}, dim::StaticInt) where {T,N,S,A,R<:ReinterpretArray{T,N,S,A}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know at what point inference gives up and decides it isn't worth specializing, but I've run into problems with pass-throughs like dim::StaticInt
instead of ::StaticInt{dim}
to force specialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you recall if it was sufficient to just add the static parameter like this
function _reshaped_axis_type(::Type{R}, dim::StaticInt{D}) where {T,N,S,A,R<:ReinterpretArray{T,N,S,A},D}
Or do we need to call StaticInt(D)
in between each pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall, but suspect it is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can handle that later. I'm making a PR to add support for BitArray
s, and it'd be nice to avoid merge conflicts.
If we want to address things like dim::StaticInt
(which should be failing @inferred
tests if it was causing problems), we can do that in another PR.
I saw problems when these went through a chain of calls. I don't know what the heuristics are, but it's probably fine below a certain depth.
Actually, checking out this PR, julia> using ArrayInterface: static
julia> using ArrayInterface, VectorizationBase, LoopVectorization
julia> struct NamedDimsWrapper{L,T,N,P<:AbstractArray{T,N}} <: ArrayInterface.AbstractArray2{T,N}
parent::P
NamedDimsWrapper{L}(p) where {L} = new{L,eltype(p),ndims(p),typeof(p)}(p)
end
julia> ArrayInterface.parent_type(::Type{T}) where {P,T<:NamedDimsWrapper{<:Any,<:Any,<:Any,P}} = P
julia> ArrayInterface.has_dimnames(::Type{T}) where {T<:NamedDimsWrapper} = true
julia> ArrayInterface.dimnames(::Type{T}) where {L,T<:NamedDimsWrapper{L}} = static(Val(L))
julia> function ArrayInterface.dimnames(::Type{T}, dim) where {L,T<:NamedDimsWrapper{L}}
if ndims(T) < dim
return static(:_)
else
return static(L[dim])
end
end
julia> ArrayInterface.has_dimnames(::Type{T}) where {T<:NamedDimsWrapper} = true
julia> Base.parent(x::NamedDimsWrapper) = x.parent
julia> d = (static(:x), static(:y))
(static(:x), static(:y))
julia> x = NamedDimsWrapper{d}(ones(2,2));
julia> ArrayInterface.device(x)
ArrayInterface.CPUPointer()
julia> pointer(x)
ERROR: conversion to pointer not defined for NamedDimsWrapper{(static(:x), static(:y)), Float64, 2, Matrix{Float64}}
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] unsafe_convert(#unused#::Type{Ptr{Float64}}, a::NamedDimsWrapper{(static(:x), static(:y)), Float64, 2, Matrix{Float64}})
@ Base ./pointer.jl:67
[3] pointer(x::NamedDimsWrapper{(static(:x), static(:y)), Float64, 2, Matrix{Float64}})
@ Base ./abstractarray.jl:1116
[4] top-level scope
@ REPL[13]:1
julia> stridedpointer(x)
ERROR: conversion to pointer not defined for NamedDimsWrapper{(static(:x), static(:y)), Float64, 2, Matrix{Float64}}
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] unsafe_convert(#unused#::Type{Ptr{Float64}}, a::NamedDimsWrapper{(static(:x), static(:y)), Float64, 2, Matrix{Float64}})
@ Base ./pointer.jl:67
[3] pointer(x::NamedDimsWrapper{(static(:x), static(:y)), Float64, 2, Matrix{Float64}})
@ Base ./abstractarray.jl:1116
[4] memory_reference
@ ~/.julia/dev/VectorizationBase/src/strided_pointers/stridedpointers.jl:36 [inlined]
[5] memory_reference
@ ~/.julia/dev/VectorizationBase/src/strided_pointers/stridedpointers.jl:35 [inlined]
[6] stridedpointer(A::NamedDimsWrapper{(static(:x), static(:y)), Float64, 2, Matrix{Float64}})
@ VectorizationBase ~/.julia/dev/VectorizationBase/src/strided_pointers/stridedpointers.jl:71
[7] top-level scope
@ REPL[14]:1
julia> LoopVectorization.check_args(x)
true
|
Maybe we should get rid of: function device(::Type{T}) where {T<:AbstractArray}
P = parent_type(T)
T === P ? CPUIndex() : device(P)
end and just return This problem isn't specific to I'm good with merging this though, and making this consistent in another PR. |
Originally this was suppose to be a simple proposal for
AbstractArray2
, but I inevitably found a bunch of little things that needed to be fixed along the way.AbstractArray2
: Although we've designed things to work outside of a type hierarchy, we still need to manually redirect dispatch from Base to ArrayInterface in a lot of cases. This makes that easier.I'm completely open to suggestions, so if you think I should change how stuff is named, organized, or just dropping some feature, feel free to be vocal.