-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add pure kwarg to map #71
Changes from 34 commits
6056598
62dcdb8
6745eef
2de3717
a3de2c3
6be6bde
065629d
73ff1ed
819bddf
4aef8d9
b3dc12c
eb5f5ee
9845443
ef7b491
de11e9e
5d80486
14bc92b
75aacf9
ef02b27
ebc4d0a
f1b60b1
d9a7e4a
60fdbfc
1716888
8aab260
a6c6e65
a59e22a
2f80224
3a0b7e8
3a97346
72a6089
09c2f20
ade7029
6acf2e8
a059bef
6a4bfa5
950d914
5c13102
4204992
aba380c
60efc6f
b98ee8f
a146fa6
97fe088
0b21c63
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -131,6 +131,7 @@ _widen(::Type{UInt32}) = UInt64 | |
_widen(::Type{Int8}) = Int16 | ||
_widen(::Type{Int16}) = Int32 | ||
_widen(::Type{Int32}) = Int64 | ||
|
||
# Constructor from array, invpool, and ref type | ||
|
||
""" | ||
|
@@ -139,6 +140,8 @@ _widen(::Type{Int32}) = Int64 | |
Freshly allocate `PooledArray` using the given array as a source where each | ||
element will be referenced as an integer of the given type. | ||
|
||
`PooledArray` constructor is not type stable. | ||
|
||
If `reftype` is not specified, Boolean keyword arguments `signed` and `compress` | ||
determine the type of integer references. By default (`signed=false`), unsigned integers | ||
are used, as they have a greater range. | ||
|
@@ -304,7 +307,65 @@ Base.findall(pdv::PooledVector{Bool}) = findall(convert(Vector{Bool}, pdv)) | |
## | ||
############################################################################## | ||
|
||
function Base.map(f, x::PooledArray{T,R}) where {T,R<:Integer} | ||
""" | ||
map(f, x::PooledArray; pure::Bool=false) | ||
|
||
Transform `PooledArray` `x` by applying `f` to each element. | ||
|
||
If `pure=true` then `f` is applied to each element of pool of `x` | ||
exactly once (even if some elements in pool are not present it `x`). | ||
This will typically be much faster when the proportion of unique values | ||
in `x` is small. | ||
bkamins marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
bkamins marked this conversation as resolved.
Show resolved
Hide resolved
|
||
If `pure=false` the value returned by `map` is not type stable. | ||
bkamins marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
function Base.map(f, x::PooledArray{<:Any, R, N, RA}; pure::Bool=false)::Union{PooledArray{<:Any, R, N, RA}, | ||
PooledArray{<:Any, Int, N, | ||
typeof(similar(x.refs, Int, ntuple(i -> 0, ndims(x.refs))))}} where {R, N, RA} | ||
pure && return _map_pure(f, x) | ||
length(x) == 0 && return PooledArray([f(v) for v in x]) | ||
v1 = f(x[1]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @quinnj and @nalimilan - just to double check. Are we sure that I recall @nalimilan recently giving some comment that potentially a non-standard
I will have a look into it later if you do not have an immediate answer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that in theory any There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok - then I have added an appropriate check in the inner constructor. |
||
invpool = Dict(v1 => one(eltype(x.refs))) | ||
pool = [v1] | ||
labels = similar(x.refs) | ||
labels[1] = 1 | ||
nlabels = 1 | ||
return _map_notpure(f, x, 2, invpool, pool, labels, nlabels) | ||
end | ||
|
||
function _map_notpure(f, xs::PooledArray, start, | ||
invpool::Dict{T,I}, pool::Vector{T}, | ||
labels::AbstractArray{I}, nlabels::Int) where {T, I<:Integer} | ||
for i in start:length(xs) | ||
vi = f(xs[i]) | ||
Ti = typeof(vi) | ||
lbl = get(invpool, vi, zero(I)) | ||
if lbl != zero(I) | ||
labels[i] = lbl | ||
else | ||
if nlabels == typemax(I) || Ti !== T | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right |
||
I2 = nlabels == typemax(I) ? Int : I | ||
T2 = Ti isa T ? T : Base.promote_typejoin(T, Ti) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right, I switched to |
||
nlabels += 1 | ||
invpool2 = convert(Dict{T2, I2}, invpool) | ||
invpool2[vi] = nlabels | ||
pool2 = convert(Vector{T2}, pool) | ||
bkamins marked this conversation as resolved.
Show resolved
Hide resolved
|
||
push!(pool2, vi) | ||
labels2 = convert(AbstractArray{I2}, labels) | ||
labels2[i] = nlabels | ||
return _map_notpure(f, xs, i + 1, invpool2, pool2, | ||
labels2, nlabels) | ||
end | ||
nlabels += 1 | ||
labels[i] = nlabels | ||
invpool[vi] = nlabels | ||
push!(pool, vi) | ||
end | ||
end | ||
return PooledArray(RefArray(labels), invpool, pool) | ||
end | ||
|
||
function _map_pure(f, x::PooledArray) | ||
ks = collect(keys(x.invpool)) | ||
vs = collect(values(x.invpool)) | ||
ks1 = map(f, ks) | ||
|
@@ -601,14 +662,14 @@ _perm(o::F, z::V) where {F, V} = Base.Order.Perm{F, V}(o, z) | |
|
||
Base.Order.Perm(o::Base.Order.ForwardOrdering, y::PooledArray) = _perm(o, fast_sortable(y)) | ||
|
||
function Base.repeat(x::PooledArray, m::Integer...) | ||
function Base.repeat(x::PooledArray, m::Integer...) | ||
Threads.atomic_add!(x.refcount, 1) | ||
PooledArray(RefArray(repeat(x.refs, m...)), x.invpool, x.pool, x.refcount) | ||
end | ||
|
||
function Base.repeat(x::PooledArray; inner = nothing, outer = nothing) | ||
Threads.atomic_add!(x.refcount, 1) | ||
PooledArray(RefArray(repeat(x.refs; inner = inner, outer = outer)), | ||
PooledArray(RefArray(repeat(x.refs; inner = inner, outer = outer)), | ||
x.invpool, x.pool, x.refcount) | ||
end | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -500,3 +500,56 @@ end | |
pa2 = repeat(pa1, inner = (2, 1)) | ||
@test pa2 == [1 2; 1 2; 3 4; 3 4] | ||
end | ||
|
||
@testset "map pure tests" begin | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a few There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I add the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added |
||
x = PooledArray([1, 2, 3]) | ||
x[3] = 1 | ||
y = map(-, x, pure=true) | ||
@test refpool(y) == [-1, -2, -3] | ||
@test y == [-1, -2, -1] | ||
|
||
y = map(-, x) | ||
@test refpool(y) == [-1, -2] | ||
@test y == [-1, -2, -1] | ||
|
||
function f() | ||
i = Ref(0) | ||
return x -> (i[] -= 1; i[]) | ||
end | ||
|
||
# the order is strange as we iterate invpool which is a Dict | ||
# and it depends on the version of Julia | ||
y = map(f(), x, pure=true) | ||
d = Dict(Set(1:3) .=> -1:-1:-3) | ||
@test refpool(y) == [d[i] for i in 1:3] | ||
@test y == [d[v] for v in x] | ||
|
||
y = map(f(), x) | ||
@test refpool(y) == [-1, -2, -3] | ||
@test y == [-1, -2, -3] | ||
bkamins marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
x = PooledArray([1, missing, 2]) | ||
y = map(identity, x) | ||
@test isequal(y, [1, missing, 2]) | ||
@test typeof(y) === PooledVector{Union{Missing, Int}, UInt32, Vector{UInt32}} | ||
|
||
x = PooledArray([1, missing, 2], signed=true, compress=true) | ||
y = map(identity, x) | ||
@test isequal(y, [1, missing, 2]) | ||
@test typeof(y) === PooledVector{Union{Missing, Int}, Int8, Vector{Int8}} | ||
|
||
x = PooledArray(fill(1, 200), signed=true, compress=true) | ||
y = map(f(), x) | ||
@test y == -1:-1:-200 | ||
@test typeof(y) === PooledVector{Int, Int, Vector{Int}} | ||
|
||
x = PooledArray(reshape(fill(1, 200), 2, :), signed=true, compress=true) | ||
y = map(f(), x) | ||
@test y == reshape(-1:-1:-200, 2, :) | ||
@test typeof(y) === PooledMatrix{Int, Int, Matrix{Int}} | ||
|
||
x = PooledArray(fill("a"), signed=true, compress=true) | ||
y = map(f(), x) | ||
@test y == fill(-1) | ||
@test typeof(y) === PooledArray{Int, Int8, 0, Array{Int8, 0}} | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about explaining why that's the case, making the link with the next paragraph? And isn't the constructor type stable when
reftype
is specified?BTW, maybe we should mark the constructor as
@inline
to ensure that when the default value of keyword arguments is used, the return type is inferred?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added inline. I have rewritten the docstring. Indeed when
reftype
is passed the constructor is type stable.