-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Roadmap #4
Comments
I've made WIP PRs for PooledArrays and WeakRefStrings, which are enough to remove both as StructrArrays dependency in exchange for DataAPI. I was wondering: shouldn't placeholders for the Tables interface also be here? Say |
Alright, I finally had some time to look at this. I really like I don't have an opinion about the I'm not a fan of I wonder whether one can avoid Also no opinion about I really don't like that all of these things that seem pretty unrelated to me are just put into one generic package. For example, the My understanding of the two So I think for the DataValues.jl situation, maybe we should investigate the following two options next instead: |
Yes and no. This is generally agreed upon as the best (i.e. simplest) way to handle optional dependencies at the moment, based on the extensive discussion in a discourse thread. The point of
Yes,
They're not unrelated. These are important functions to a whole set of "data"-related packages. As stated above, this is generally agreed upon (including core devs) as a valid, currently good way to handle optional dependencies. I've also laid out clear guidelines around how the package will be managed, including strict constraints on what can be included to ensure fast-as-possible load times and a lightweight dependency. I'm personally not a fan of creating multiple "interface" packages just for the sake of separating things that can just as well live in a single interface package; I also strongly believe in keeping things together until there's a clear need/request to separate, which we can always do later on. It's just a lot of unnecessary administrative churn for the registry, package maintenance, etc. to have multiple packages and then find out we don't need one.
Isn't this the approach you've taken w/ IterableTables.jl though? Implemented the interfaces for a bunch of packages, and then work on moving those implementations upstream? I mean, I agree that generally it's not desirable, but it's also the current lack-of-optional-dependency world we live in. Interfaces are being created and people want implementations and we don't always get them defined in the upstream parent package right away, and probably for the best so that implementations can mature and be proven out. But ideally, once things settle, things can be moved upstream.
Requires.jl is indeed slow and subtly so. I spent a full two days digging into its internals and basically started the discourse thread above based on my findings. Every time a package is loaded, it does checks around any registered callbacks; the checks are compounded when multiple packages get involved, each registering callbacks, and an expensive I'm not aware of any current efforts to provide real optional dependency management in 1.3 or even in the foreseeable future. Taking a dependency on DataValues.jl in Tables.jl also isn't an option; currently DataValues.jl takes ~0.2 seconds to load, and Tables.jl w/o Requires.jl takes ~0.1 seconds; Tables.jl w/ Requires.jl (current master) takes ~0.3 seconds to load, so it wouldn't help at all. Having Requires.jl is also problematic for Tables.jl because it is one of these "lower stack" type packages; lots of other packages depend on it and the Requires.jl performance hit is compounded. |
Regarding |
But even if DataValues overloads |
I'm opening this issue as a place to discuss the path forward for this package and for people to give their feedback. Here's where things are at in my mind:
refX
DataAPI.jl functions; while I understand the basics, I'm not as familiar w/ the implementations, but I'm willing to take a stab at it if people would like. @nalimilan and @piever are much more aware of how packages like PooledArrays, CategoricalArrays, and StructArrays can take advantage of sharing the common ref functions. I think it was also suggested at some point that we may want a RefArrays.jl package that was home to various sorting/grouping optimization routines that DataFrames/StructArrays could then share. I'm happy to push forward on making those changes, but I'll need to have some discussions w/ @nalimilan and @piever for guidance.In terms of steps forward, here's what I think:
ref*
function API is solid and draft PRs showing how packages could share these functionsPlease ping anyone else who might be interested or have something useful to add to the discussion here. I don't think there's a super rush on any of this, but I know DataFrames is approaching a 1.0 release in the next few months and it would be good to cleanup its dependencies soon.
The text was updated successfully, but these errors were encountered: