-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: The Plan #1
Comments
Cc : @blegat , @juan-pablo-vielma |
I plan to continue adding to this as soon as I make some progress on my current task. |
Named DimensionsThese are wanted for Neural networks mostly, to make sure that you are summing and batching etc over the dimension you think you are.
Most of this functionality should be implemented in a kind of traitish way so that |
I just started looking through the NamedDims repo, so I apologize if this is already documented but I missed it. Is NamedDims ultimately intended to be integrated into AxisArrays or is it suppose to be a dependency or something else entirely? Also, I think https://github.com/JuliaDiffEq/LabelledArrays.jl implements a really simple method of indexing for symbols that can be adapted to indices. I know the comments in https://github.com/invenia/NamedDims.jl/blob/master/src/name_core.jl#L66 indicate can run at compile time but I haven't really stress tested NamedDims for speed yet. I have a mostly finished version of all the statically typed ranges in base and tested against a lot of the range tests in Base https://github.com/Tokazama/StaticRanges.jl. I'm hoping to figure out a mix of dynamic and static indexing in a week or two when classes finish up here. I figured this may be helpful given how much of AxisArrays's code is dedicated to figuring out indexing. |
NamedDims.jl is intended to A) Be used on its own, So if you only want Named Dimensions and nothing else.
Probably. It is a different thing again to either of the packages we're talk about (as I think you know.)
As of
This looks interesting and might fill some, or all the pieces that the proposed Indexes.jl wants to do. In general I am currently focussing on NamedDims, because they solve the bugs that had in my code related to various incorrect dimensions being summed and then std dev'd over. Indexes.jl will to solve the other kinds of mistake. |
On Monday, @oxinabox pointed to https://github.com/andyferris/AcceleratedArrays.jl as a candidate for the index lookup aspect and I think it's a great choice and I've been running with it on what I'm calling IndexedDims.jl. My goal for the JuliaCon hackathon tomorrow is to get a reasonable prototype of the indexed dimensions aspect functional, which may not have all of its necessary performance optimizations and may not actually overload getindex yet. Key issues that require attention with this approach:
|
It's great to see this issue and some progress on replacing |
This might be of interest: https://github.com/rafaqz/DimensionalData.jl It's a mix of AxisArays.jl and NamedDims.jl. Basically as @c42f said above the things I wanted aren't possible without a serious refactor of AxisArrays. Mostly its easier to extend and the syntax is more flexible for using in other packages. I wrote it for spatial data, and its extended in https://github.com/rafaqz/GeoData.jl Edit: The list of concepts above is basically identical to the concepts in DimensionalData.jl. The main point of difference is the stress on always using abstract types and method dispatch in DimensionalData - inherited behaviour instead of wrapper types. A few other points relating to the list of @iamed2 and DimensionalData :
Its a little verbose in the last case. but I can't think of how to simplify it while maintaining the flexibility. Also I've just registered DD as I have a whole chain of packages depending on it, but it would be great not to maintain it on my own in future, and I'm keen to integrate the functionality into a future AxisArrays if it's possible. |
@rafaqz DimensionalData looks nice, have been playing around with it a bit. I've started a discussion over at AcceleratedArrays - andyferris/AcceleratedArrays.jl#4 - to try and make rebuilding, slicing and dicing AcceleratedArrays cheaper. I currently have my own shoddy equivalents of |
IndexedDims.jl is currently building on AcceleratedArrays.jl |
yes, exactly! it's an exciting time to be an accelerated array. |
@iamed2 on this topic, I notice you have a (probably zero-overhead) |
@kcajf in DimensionalData getindex is just vanilla getindex to preserves the interface. Dimension or selector wrapper types trigger other behaviours as you are suggesting. I think this has to be the default and one of the biggest problems with other packages. It also means wrapper types work even if you dont inherit from AbstractDimensionalArray, because dispatch isn't on the array type but the wrapper. Edit: yes getting the syntax for these wrapper types is the hardest part. Ive been trying yo find the shortest simplest words for them, but I wish we had better piping operators. A better varaiety of ascii infix ops would also help as sugar for type wrapper syntax |
One thing that has yet to be discussed is a meta package. The most obvious benefits to a meta package are:
|
What do you think would be in a meta package? |
The only thing that I think is absolutely necessary is an array type with the named dimensions and indexing capabilities combined from all the basic packages. So if we were using IndexedDims and NamedDims there could be some sort of exported constant that wraps an Everything else I can think of would mostly be tying syntax together, like making use of well thought out syntax found in DimensionalData. It would be nice if we could use thinks like |
Ok, I think we have some quite different approaches here! but I am quite interested to see which is the most practical. First: do you mean a concrete or abstract array type? I'm concerned that using concrete types anywhere will create problems for extensibility. All of the methods that matter in DimensionalData are on abstract types, and the concrete array type is basically just an example. The AbstractDimensionalArray type provides the inherited behaviour in GeoData.jl - for both AbstractGeoArray and AbstractGeoSeries. Another difference is I'm avoiding array wrappers and connecting indexing modes to the array type in any way. Most of the time (like view/getindex etc) the AbstractDimensionalArray inheritance isn't the object controlling dispatch: the AbstractDimension and Selector are. With some extra work you can use half the DD package without any inheritance (or wrappers) on the array type. I was planning for indexing behaviour such as using accelerated arrays to be located entirely in the dimension types, not the array! I think its much simpler and more modular if array types don't 'know' that various indexing types even exist. GeoArray in GeoData.jl has no interaction with dimension/selector indexing, even inherited - dispatch happens on |
Concrete, but type-parameterized. It can all work the same on abstract stuff I guess, The reason to make it abstract is if someone decides to have varients on it. |
Yeah that makes sense, although those examples would work equally well if they were concrete types inheriting from an abstract type and it's not a huge overhead to maintain. I pretty much wrote DimensionalData.jl because I had a use case for inheriting indexing behaviour and AxisArrays couldn't manage it. In GeoData.jl AbstractGeoArray and AbstractGeoSeries both inherit from DimensionalArray, but have quite different roles and behaviours that are differentiated using dispatch. But the indexing is identical, as they have no interaction with it besides having a dims field they return when Then AbstractGeoStack isn't even in AbstractArray but it and its descendants also use dim/selector indexing. In this case extensibility is via defining a I can't see how to do these things cleanly with concrete array wrappers. There ends up being too much going on and dispatch and plot recipes get complicated. And it obviously can't work for non-arrays. Edit: There are also cases where you want to lazily load the dimension arrays (ie when they are big matrices in an online database) and having dimensions handled by a concrete wrapper from another package makes that difficult. Inheriting indexing behaviours and adding a lazy |
I have no idea what part of that issue is not already complete. |
Maybe the follow-up to that issue is to wonder what happens next to keywords. Should For example, strings seem to pass along just fine:
|
Basically, the standard for overloading That would be woth its own issue on JuliaLang/Julia |
Actually this seems quite messy, how should the named equivalent of |
Yes, doesn't work great for some. It works well for things that don't change indexing, like |
Now I’m struggling to think of a type simple enough for this to work... I don’t think
|
Probably not, since something that should return a row vector will return a column vector. |
@mcabbott I agree A4[iter=Near(21.5), var=1] # doesn’t work but could That could be added pretty easily and is easy to write, and I've thought about it a lot. The problem is we would lose dispatch on dimensions with regular indexing. Dispatch on But I agree the Edit: another options is to add a Symbol method to the I'm not totally comfortable using the pipe like that, and unfortunately the range of right associative operators isn't amazing. |
Why must it be right-associative? I guess you can have |
It needs to be right associative otherwise you'll need brackets for ranges. Yes I also wouldn't say "too early", it's just a matter of priorities. Mine is to hide the dims mechanics from implementations, so I remove dims as early as possible so implementations only see regular indexing. |
OK. My point re dispatch was this: I thought the point of being able to dispatch on |
Small update In the process of registering StaticRanges.jl which will hopefully be followed up with SortedArrays.jl. The first started off as an entirely unrelated project to this but I realized that an implementation that used anything other than 1-based indexing would need to reconstruct the entire array or have a mutable struct and reconstruct the axes when concatenating, resizing, etc. |
Maybe this is ready for a link here: AxisRanges.jl defines another thin wrapper, which works with NamedDims.jl for names. The two should commute, apart from bugs. It’s callable so that Round brackets look up indices via To allow I had a go at making it work with Tables.jl, see what you think. It's not so obvious what the equivalent of Edit: broadcasting now works, as do |
I've been thinking about the problem of indexing by We could have normal indexing by an julia> A = AxisArray2(reshape(1:9, 3,3),
2:4, # keys(axes(A, 1))
3.0:5.0) # keys(axes(A, 2))
julia> A[1,1]
1
julia> A[==(2),==(3.0)]
1
julia> A[1:2,1:2]
2×2 Array{Int64,2}:
1 4
2 5
julia> A[<(4), <(5.0)]
2×2 Array{Int64,2}:
1 4
2 5 I really like this example because:
|
One possibly weird feature of using functions like this is that in your example In AxisRanges right now, One concern is that you can't have |
If there's no performance or functionality cost to using round brackets then that's a better idea. |
I'm not particularly fond of supporting anonymous functions like In my head it seems to be parallel to the behavior of If this is too convoluted then I'm probably going down the wrong path though. |
I have AxisIndices.jl up and running now. It implements the syntax I have proposed above but I've also tried to make it easy to customize behavior. It should be trivial to implement a type that indexes keys by uses round brackets if people want to experiment with that. Hopefully, this will make it so people don't have to waste time reimplementing all of the methods unrelated to indexing for a new array type (e.g., |
Just an update for DimensionalData.jl. You can now use named syntax for everyone who prefers that: julia> A = DimArray(rand(10, 20, 30), (:a, :b, :c));
julia> A[a=2:5, c=9]
DimArray with dimensions:
Dim{:a}: 2:5 (NoIndex)
Dim{:b}: Base.OneTo(20) (NoIndex)
and referenced dimensions:
Dim{:c}: 9 (NoIndex)
and data: 4×20 Array{Float64,2}
0.868237 0.528297 0.32389 … 0.89322 0.6776 0.604891
0.635544 0.0526766 0.965727 0.50829 0.661853 0.410173
0.732377 0.990363 0.728461 0.610426 0.283663 0.00224321
0.0849853 0.554705 0.594263 0.217618 0.198165 0.661853 You can also use julia> A = DimArray(rand(3, 3), (cat=Val((:a, :b, :c)),
val=Val((5.0, 6.0, 7.0))))
DimArray with dimensions:
Dim{:cat}: Val{(:a, :b, :c)}() (Categorical: Unordered)
Dim{:val}: Val{(5.0, 6.0, 7.0)}() (Categorical: Unordered)
and data: 3×3 Array{Float64,2}
0.993357 0.765515 0.914423
0.405196 0.98223 0.330779
0.365312 0.388873 0.88732
julia> @btime A[1, 2]
22.934 ns (1 allocation: 16 bytes)
0.9939444595885871
julia> @btime A[:a, 7.0]
26.333 ns (1 allocation: 16 bytes)
0.32927504968939925
julia> @btime A[cat=:a, val=7.0]
31.920 ns (2 allocations: 48 bytes)
0.7476441117572306 The There are more examples in the readme |
Are there any updates for this as of April 2021? As an avid xarray user (and novice Julia user), it's hard to figure out whether this discussion has converged, died out, or moved elsewhere (Discourse?)... My guess is that it's still up in the air, considering YAXArrays exists, and nearly all the above are still being developed. I've yet to find anything I really love from the user's point of view, except maybe this comment above. Since many like
For fun (and learning Julia more in-depth), I tried my own hand at implementing a mock-up... it's nowhere near as cool as all your folks' packages, with your no-overhead compile-time stuff. 😉 |
It hasn't really converged. A few of us have continued working on packages fleshing out functionality for different use cases, but borrowing each others ideas. Both DimensionalData.jl and AxisKeys.jl do most of what you mention here, and get a lot of use. @Tokazama is working on generalizations of some of these ideas in ArrayInterface.jl and SpatioTemporalTraits.jl, which is very interesting work, and might end up tying some of these packages together. And not loving things isn't very useful feedback. What specific things don't work for you in available packages? What do you want to do that you can't? |
As rafaqz said, I'm working on standardizing a lot of the interface for this kind of stuff. I really wish I had more to say at this point but I really need to get this stuff in place for discussion with those developing other packages before addressing general user interface stuff (mainly documented interfaces with competitive benchmarks times across all indexing types to support the change). I wish I could give an estimated time but I can't seem to finish projects faster than they are assigned to me lately. Hopefully I can start hitting this hard again in the next couple weeks. |
This issue is still being written
See discussion in JuliaArrays/AxisArrays.jl#84 for background
Concepts
AxisArrays.jl is currently a mashup of distinct and mostly orthogonal indexing behaviour. These are, at the highest level:
Named Dimensions
Index Lookup
Indexed Dimensions
getindex
TranslationThe text was updated successfully, but these errors were encountered: