-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setindex, push and convenience constructor #12
Conversation
Interesting. On one hand I worry this violates some of the basic assumptions of AbstractArrays (getting and setting values should be fast), OTOH this does seem useful. On balance I'm leaning in favor, but would be happy if other JuliaArrays contributors chimed in. Out of curiosity, how big is the |
When working with sparse and distributed arrays I learned to take the basic assumptions of
My own use cases for these kinds of arrays have typically been statistics, i.e. what is called a |
I asked because I wondered if an inverse-mapping Dict might be better. But at less than 10 I suspect linear search would be hard to beat. We could also add another type that does this via a Dict (or static perfect-hashing algorithm), if the need arises. Unless others object, I'm personally willing to take this. Thanks for stepping forward. |
89fe7f3
to
6ad8e19
Compare
Codecov Report
@@ Coverage Diff @@
## master #12 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 1 1
Lines 16 38 +22
=====================================
+ Hits 16 38 +22
Continue to review full report at Codecov.
|
b86df94
to
66906c6
Compare
I've added some tests so I think this is ready. Some of these methods can probably be optimized but for most applications, I only think a fairly fast |
66906c6
to
b637a1b
Compare
@timholy Bump. This is blocking a Gadfly update so it would be great to get merged soon. Please let me know if it needs changes. |
if folks object to this, am i correct in thinking that Gadfly could import and extend IndirectArrays instead? |
Bump |
bump |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that @timholy said he was OK on the principle, I'm going to (ab)use my powers and merge this unless somebody objects. It's been blocking the port of Gadfly to DataFrames 0.11 for too long.
src/IndirectArrays.jl
Outdated
@inline function Base.setindex!(A::IndirectArray, x, i::Int) | ||
@boundscheck checkbounds(A.index, i) | ||
idx = findfirst(A.values, x) | ||
if idx == 0 || idx == nothing # findfird changed in 0.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"findfirst". Could also use Compat.findfirst
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I've updated the PR.
src/IndirectArrays.jl
Outdated
@@ -26,10 +28,19 @@ end | |||
Base.@propagate_inbounds IndirectArray(index::AbstractArray{<:Integer,N}, values::AbstractVector{T}) where {T,N} = | |||
IndirectArray{T,N,typeof(index),typeof(values)}(index, values) | |||
|
|||
function (::Type{IndirectArray{T}})(A::AbstractArray, values::AbstractVector = unique(A)) where {T} | |||
# Use map! to make sure that index is an Array | |||
index = map!(t -> findfirst(values, t), Array{T}(size(A)...), A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this. Much better to use indexin(A, values)
here AFAICT. Not sure what should happen if some values of A
do not appear in values
: is throwing an error OK?
For maximal performance, it would even make sense to use values = nothing
by default, and compute unique values using the same dict as the one created by indexin
.
I've tagged a release: JuliaLang/METADATA.jl#13782 |
I'm now sure if you want this package to be minimal or if you are okay with adding some functionality like this. If you are okay with adding stuff, I'll finish up the PR. I think this could be useful as an internal replacement for the now deprecated
PooledDataArray
s inGadfly
but then some extra functionality similar to the methods defined here are needed.