Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimised methods for reducing(hcat/vcat on any iterators of vectors #31644

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 98 additions & 1 deletion base/reduce.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# This file is a part of Julia. License is MIT: https://julialang.org/license

## Helpers
eltype_or_default_eltype(itr::T) where T = eltype_or_default_eltype(itr, IteratorEltype(T))
eltype_or_default_eltype(itr::T, ::HasEltype) where T = eltype(T)
eltype_or_default_eltype(itr, ::EltypeUnknown) = @default_eltype(itr)

## reductions ##

###### Generic (map)reduce functions ######
Expand Down Expand Up @@ -354,10 +359,102 @@ julia> reduce(*, [2; 3; 4]; init=-1)
-24
```
"""
reduce(op, itr; kw...) = mapreduce(identity, op, itr; kw...)
function reduce(op, itr::T; kw...) where T
# Redispatch, adding traits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this comment, it is literally what the code below it does.

reduce(op, itr, eltype_or_default_eltype(itr), IteratorSize(T); kw...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a private _reduce method.

Copy link
Contributor Author

@oxinabox oxinabox Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unless it should be a private _mapreduce method.
What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess _mapreduce is better if you can support that, since that will also support mapreduce(identity, ...).

end

function reduce(op, itr, et, isize; kw...)
# Fallback: if nothing interesting is being done with the traits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this comment can likely be removed.

# or the operation
return mapreduce(identity, op, itr; kw...)
end

reduce(op, a::Number) = a # Do we want this?

##### Operation specific reduce optimisations

## vcat

function reduce(::typeof(vcat), xs, T::Type{<:AbstractVector{V}}, isize) where V
x_state = iterate(xs)
x_state === nothing && return reduce_empty(vcat, T)
x1, state = x_state

ret = Vector(x1)

hinted_size = 0
if !(isize isa SizeUnknown)
# Assume first element has representitive size, unless that would make this too large
SIZEHINT_CAP = 10^5
hinted_size = min(SIZEHINT_CAP, length(xs)*length(x1))
sizehint!(ret, hinted_size)
end

x_state = iterate(xs, state)
while(x_state !== nothing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while(x_state !== nothing)
while x_state !== nothing

x, state = x_state
append!(ret, x)
x_state = iterate(xs, state)
end

if length(ret) < hinted_size/2 # it is only allowable to be at most 2x too much memory
sizehint!(ret, length(ret))
end

return ret
end

## hcat

function reduce(::typeof(hcat), xs, T::Type{<:AbstractVector{V}}, isize::SizeUnknown) where V
x_state = iterate(xs)
x_state === nothing && return reduce_empty(hcat, T)
x1, state = x_state

dim1_size = length(x1)
dim2_size = 1
ret_vec = Vector(x1)

x_state = iterate(xs, state)

while(x_state !== nothing)
x, state = x_state
append!(ret_vec, x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know this will fit into ret?

julia> reduce(vcat, (UInt8[1,2], [1000, 5000]))
ERROR: InexactError: trunc(UInt8, 1000)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dim2_size += 1
x_state = iterate(xs, state)
end

# Reshape will throw errors if anything was the wrong size
return reshape(ret_vec, (dim1_size, dim2_size))
end

function reduce(::typeof(hcat), xs, T::Type{<:AbstractVector{V}}, isize) where V
# Size is known
x_state = iterate(xs)
x_state === nothing && return reduce_empty(hcat, T)
x1, state = x_state

dim1_size = size(x1,1)
dim2_size = length(xs)

ret = similar(x1, eltype(x1), (dim1_size, dim2_size))
copyto!(ret, 1, x1, 1)

x_state = iterate(xs, state)
offset = length(x1)+1
while(x_state !== nothing)
x, state = x_state
length(x)==dim1_size || throw(DimensionMismatch("hcat"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should include dimensions in error message

copyto!(ret, offset, x, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know this will fit into ret?

julia> reduce(hcat, (UInt8[1,2], [1000, 5000]))
ERROR: InexactError: trunc(UInt8, 1000)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't this function needs to be tightenned to

T::Type{<:AbstractVector{S}}

And things that fell back though @default_eltype need to be banned from using it.

And then a nother (marginally slower, but benchmarking will tell)
version needs to be created that can deal with hetrogenous containers.
I would say that could be in another PR and I just want to common case covered here,
but actually i would be sad if can't get speedup for generators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that could be in another PR and I just want to common case covered here,

I don't understand, this PR breaks the use cases I linked so how can it be in another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was unclear.
I mean definitately fixing that issue.
But the question of handling things like

(i for i in ([0x1, 0x2], [1,2]))

does not have to be handled, in this PR (I think it should, if it doesn't add undue complexity).
If we just tighten the type signature,
from where T<:AbstractVector to where T<:AbstractVector{S} where S.
That would solve your case by causing it to fallback to the current slow reduce methods.

Thus working out how to actually efficently handle hetrogeneous types could be in a PR;
if it was going to be hard.
But I don't think it will so it can be in this one.

offset += length(x)
x_state = iterate(xs, state)
end

return ret
end


###### Specific reduction functions ######

## sum
Expand Down
2 changes: 1 addition & 1 deletion base/reducedim.jl
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@ julia> reduce(max, a, dims=1)
4 8 12 16
```
"""
reduce(op, A::AbstractArray; kw...) = mapreduce(identity, op, A; kw...)
reduce(op, A::AbstractArray; kw...)

##### Specific reduction functions #####
"""
Expand Down
21 changes: 21 additions & 0 deletions test/reduce.jl
Original file line number Diff line number Diff line change
Expand Up @@ -526,6 +526,27 @@ test18695(r) = sum( t^2 for t in r )
end
end

@testset "optimized reduce(vcat/hcat, A) for iterators" begin
v_v_same = [rand(128) for ii in 1:100]
# the following 2 are not optimized, but we want to make sure
# that they still hit the normal reduce methods
v_v_diff = [rand(128), rand(Float32,128), rand(Int, 128)]
v_v_diff_typed = Union{Vector{Float64},Vector{Float32},Vector{Int}}[rand(128), rand(Float32,128), rand(Int, 128)]

for v_v in (v_v_same, v_v_diff, v_v_diff_typed)
# Cover all combinations of iterator traits.
g_v = (x for x in v_v)
f_g_v = Iterators.filter(x->true, g_v)
f_v_v = Iterators.filter(x->true, v_v);
hcat_expected = hcat(v_v...)
vcat_expected = vcat(v_v...)
@testset "$(typeof(data))" for data in (v_v, g_v, f_g_v, f_v_v)
@test reduce(hcat, data) == hcat_expected
@test reduce(vcat, data) == vcat_expected
end
end
end

# offset axes
i = Base.Slice(-3:3)
x = [j^2 for j in i]
Expand Down