-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use columntable in datavaluerows for col sources #279
base: main
Are you sure you want to change the base?
Conversation
Actually, just out of curiosity: why is the call to |
Codecov Report
@@ Coverage Diff @@
## main #279 +/- ##
=======================================
Coverage 94.88% 94.88%
=======================================
Files 7 7
Lines 665 665
=======================================
Hits 631 631
Misses 34 34
Continue to review full report at Codecov.
|
I have not implemented it, but it turns type-unstable to type-stable container. |
This is necessary for performance in Query.jl, where QueryOperators expect type-stable rows. At one point, we had the case where we were calling |
Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com>
@quinnj one more question: in the case where r = Tables.rows(x)
s = Tables.schema(r)
s === nothing && error("Schemaless sources cannot be passed to datavaluerows.")
return DataValueRowIterator{datavaluenamedtuple(s), typeof(s), typeof(r)}(r) but rereading your comment above, that would probably be type-instable, right? The scenario that I have in the back of my mind is row wise reading from CSV.jl in combination with Query.jl. It would be nice if something like CSV.Rows(filename) |> @filter(...) |> @select(...) |> DataFrame would just work out of the box. My thinking had been that if we merge this PR here and then add: IteratorInterfaceExtensions.getiterator(x::CSV.Rows) = Tables.datavaluerows(x)
IteratorInterfaceExtensions.isiterable(::CSV.Rows) = true
TableTraits.isiterabletable(::CSV.Rows) = true it would all work, but now I'm thinking that we might end up with a type instable situation, right? Is there some equivalent of |
I think for the rows case, we just want to wrap them as-is with |
@quinnj, the idea here would be that https://github.com/JuliaData/DataFrames.jl/blob/b1fdb7621fa38c381dad928317ec1de562d09e71/src/other/tables.jl#L93 could be rewritten as
But I'm not entirely sure this is right :) I think this PR only makes sense if the idea is that for any table that is primarily column based, one should first create a columntable before one calls the
datavaluerows
function. Is that so? I don't really understand the underlying mechanics there.The benefit of this PR would be that now the instructions for any Tables.jl source that wants to also enable TableTraits.jl integration would be a bit simpler, i.e. the instructions would be to just always add
and that would work regardless of whether the source internally stores things as columns or rows.