Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pure kwarg to map #71

Merged
merged 45 commits into from
Sep 1, 2021
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
6056598
add pure kwarg to map
bkamins Aug 13, 2021
62dcdb8
Update Project.toml
bkamins Aug 13, 2021
6745eef
Update runtests.jl
bkamins Aug 13, 2021
2de3717
Apply suggestions from code review
bkamins Aug 13, 2021
a3de2c3
Apply suggestions from code review
bkamins Aug 13, 2021
6be6bde
Update src/PooledArrays.jl
bkamins Aug 13, 2021
065629d
more efficient map for not-pure case
bkamins Aug 13, 2021
73ff1ed
Apply suggestions from code review
bkamins Aug 13, 2021
819bddf
Update src/PooledArrays.jl
bkamins Aug 13, 2021
4aef8d9
Update src/PooledArrays.jl
bkamins Aug 13, 2021
b3dc12c
Update test/runtests.jl
quinnj Aug 14, 2021
eb5f5ee
Update test/runtests.jl
quinnj Aug 14, 2021
9845443
Apply suggestions from code review
bkamins Aug 14, 2021
ef7b491
Update src/PooledArrays.jl
bkamins Aug 14, 2021
de11e9e
Update src/PooledArrays.jl
bkamins Aug 14, 2021
5d80486
Apply suggestions from code review
bkamins Aug 14, 2021
14bc92b
Update test/runtests.jl
bkamins Aug 14, 2021
75aacf9
Update test/runtests.jl
bkamins Aug 14, 2021
ef02b27
fix promotion
bkamins Aug 14, 2021
ebc4d0a
fix type chceck
bkamins Aug 14, 2021
f1b60b1
Apply suggestions from code review
bkamins Aug 14, 2021
d9a7e4a
Update src/PooledArrays.jl
bkamins Aug 14, 2021
60fdbfc
Update PooledArrays.jl
bkamins Aug 14, 2021
1716888
Apply suggestions from code review
bkamins Aug 14, 2021
8aab260
small changes
bkamins Aug 17, 2021
a6c6e65
Merge branch 'bkamins-patch-2' of https://github.com/JuliaData/Pooled…
bkamins Aug 17, 2021
a59e22a
remove type assertion
bkamins Aug 17, 2021
2f80224
Apply suggestions from code review
bkamins Aug 18, 2021
3a0b7e8
Update src/PooledArrays.jl
bkamins Aug 18, 2021
3a97346
Update src/PooledArrays.jl
bkamins Aug 18, 2021
72a6089
fix signature
bkamins Aug 18, 2021
09c2f20
Apply suggestions from code review
bkamins Aug 20, 2021
ade7029
Apply suggestions from code review
bkamins Aug 20, 2021
6acf2e8
Update src/PooledArrays.jl
bkamins Aug 20, 2021
a059bef
Update src/PooledArrays.jl
bkamins Aug 29, 2021
6a4bfa5
apply suggestions from code review
bkamins Aug 29, 2021
950d914
Apply suggestions from code review
bkamins Aug 29, 2021
5c13102
amend tests
bkamins Aug 29, 2021
4204992
fix Ti issues
bkamins Aug 31, 2021
aba380c
add Base.require_one_based_indexing test in inner constructor
bkamins Aug 31, 2021
60efc6f
fix offest checks for old Julia versions
bkamins Aug 31, 2021
b98ee8f
another fix to Julia 1.0
bkamins Aug 31, 2021
a146fa6
fix typo
bkamins Aug 31, 2021
97fe088
Apply suggestions from code review
bkamins Aug 31, 2021
0b21c63
apply review comments
bkamins Aug 31, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name = "PooledArrays"
uuid = "2dfb63ee-cc39-5dd5-95bd-886bf059d720"
version = "1.2.1"
version = "1.3.0"

[deps]
DataAPI = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
Expand Down
68 changes: 67 additions & 1 deletion src/PooledArrays.jl
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,73 @@ Base.findall(pdv::PooledVector{Bool}) = findall(convert(Vector{Bool}, pdv))
##
##############################################################################

function Base.map(f, x::PooledArray{T,R}) where {T,R<:Integer}
"""
map(f, x::PooledArray; pure::Bool=false)

Transform `PooledArray` `x` by applying `f` to each element.

If `pure=true` then `f` is applied to each element of pool of `x`
exactly once (even if some elements in pool are not present it `x`).
This will typically be much faster when the proportion of unique values
in `x` is small.
bkamins marked this conversation as resolved.
Show resolved Hide resolved
"""
function Base.map(f, x::PooledArray; pure::Bool=false)
pure && return _map_pure(f, x)
length(x) == 0 && return PooledArray(collect(Base.Generator(f, x)))
bkamins marked this conversation as resolved.
Show resolved Hide resolved
v1 = f(x[1])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quinnj and @nalimilan - just to double check. Are we sure that PooledArray always uses 1-based indexing and even if it is multidimensional it can use a linear index?

I recall @nalimilan recently giving some comment that potentially a non-standard refs field could be used. On the other hand eachindex(IndexLinear(), ::PooledArray) falls back to the default implementation:

eachindex(::IndexLinear, A::AbstractArray) = (@inline; oneto(length(A)))
eachindex(::IndexLinear, A::AbstractVector) = (@inline; axes1(A))

I will have a look into it later if you do not have an immediate answer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in theory any AbstractArray type can be used as refs, but in practice we probably haven't checked that things work for non-1-based indexing, and the fallbacks you show assume 1-based indexing, right? So I'd say it's OK to assume 1 for now and fix that later if somebody complains.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - then I have added an appropriate check in the inner constructor.

T = typeof(v1)
bkamins marked this conversation as resolved.
Show resolved Hide resolved
invpool = Dict(v1 => one(eltype(x.refs)))
pool = [v1]
labels = similar(x.refs)
labels[1] = 1
nlabels = 1
return _map_notpure(f, x, 2, invpool, pool, labels, nlabels)
end

function _map_notpure(f, xs::PooledArray,
start,
invpool::Dict{T,I},
pool::Vector{T},
labels::Array{I},
nlabels::Int) where {T, I<:Integer}
bkamins marked this conversation as resolved.
Show resolved Hide resolved
for i in start:length(xs)
vi = f(xs[i])
Ti = typeof(vi)
lbl = get(invpool, x, zero(I))
bkamins marked this conversation as resolved.
Show resolved Hide resolved
if lbl !== zero(I)
bkamins marked this conversation as resolved.
Show resolved Hide resolved
labels[i] = lbl
else
if nlabels == typemax(I) || !(Ti isa T)
if nlabels == typemax(I)
I2 = _widen(I)
else
I2 = I
end
if Ti isa T
T2 = T
else
T2 = typejoin(T, Ti)
end
bkamins marked this conversation as resolved.
Show resolved Hide resolved
nlabels += 1
invpool2 = convert(Dict{T2, I2}, invpool)
invpool2[vi] = nlabels
pool2 = convert(Vector{T2}, pool)
bkamins marked this conversation as resolved.
Show resolved Hide resolved
push!(pool2, vi)
labels2 = convert(Vector{I2}, labels)
bkamins marked this conversation as resolved.
Show resolved Hide resolved
labels2[i] = nlabels
return _map_notpure(f, xs, invpool2, pool2,
labels2, i+1, nlabels)
bkamins marked this conversation as resolved.
Show resolved Hide resolved
end
nlabels += 1
labels[i] = nlabels
invpool[vi] = nlabels
push!(pool, vi)
end
end
return PooledArray(RefArray(labels), invpool, pool)
end

function _map_pure(f, x::PooledArray)
ks = collect(keys(x.invpool))
vs = collect(values(x.invpool))
ks1 = map(f, ks)
Expand Down
26 changes: 26 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -500,3 +500,29 @@ end
pa2 = repeat(pa1, inner = (2, 1))
@test pa2 == [1 2; 1 2; 3 4; 3 4]
end

@testset "map pure tests" begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a few @inferred checks where applicable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add the @inferred tests elsewhere, as here nothing is inferred since now inference produces the Union of two concrete types, see:

julia> x = PooledArray(fill(1, 200), signed=true, compress=true);

julia> @inferred map(Int, x)
ERROR: return type PooledVector{Int64, Int8, Vector{Int8}} does not match inferred return type Union{PooledVector{Int64, 
Int64, Vector{Int64}}, PooledVector{Int64, Int8, Vector{Int8}}}

julia> @inferred map(Int, x, pure=true)
ERROR: return type PooledVector{Int64, Int8, Vector{Int8}} does not match inferred return type Union{PooledVector{Int64, 
Int64, Vector{Int64}}, PooledVector{Int64, Int8, Vector{Int8}}}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added @inferred with Union for map tests

x = PooledArray([1, 2, 3])
x[3] = 1
y = map(-, x, pure=true)
@test refpool(y) = [-1, -2, -3]
quinnj marked this conversation as resolved.
Show resolved Hide resolved
@test y == [-1, -2, -1]

y = map(-, x)
@test refpool(y) = [-1, -2]
quinnj marked this conversation as resolved.
Show resolved Hide resolved
@test y == [-1, -2, -1]

function f()
i = Ref(0)
return x -> (i[] -= 1; i[])
end

# the order is strange as we iterate invpool which is a Dict
y = map(f(), x, pure=true)
@test refpool(y) = [-3, -1, -2]
bkamins marked this conversation as resolved.
Show resolved Hide resolved
@test y == [-3, -1, -3]

y = map(f(), x)
@test refpool(y) = [-1, -2, -3]
bkamins marked this conversation as resolved.
Show resolved Hide resolved
@test y == [-1, -2, -3]
bkamins marked this conversation as resolved.
Show resolved Hide resolved
end