-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: unstack
receives kwarg fillvalue
#2828
Conversation
Why do you think so? I would assume that if |
Ah - after reviewing I see what you mean. What you have implemented is OK I think. One can run However, it is important to test the additional scenarios I have mentioned I think. Especially for Other than that - the PR looks good. |
@pstorozenko - merge conflicts require fixing + have you had time to add the tests I have suggested? |
@pstorozenko bump |
Sorry for not answering.
|
Point 2 is correct. Point 1 - there is a corner case when |
src/abstractdataframe/reshape.jl
Outdated
unstacked_val = [fill!(similar(valuecol, | ||
promote_type(eltype(valuecol), typeof(fillvalue)), | ||
Nrow), | ||
fillvalue) for _ in 1:Ncol] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalimilan - this definition is problematic for categorical columns the problem is that:
julia> using CategoricalArrays
julia> valuecol = categorical(["a"])
1-element CategoricalArray{String,1,UInt32}:
"a"
julia> fillvalue = ""
""
julia> Nrow = 1
1
julia> similar(valuecol, promote_type(eltype(valuecol), typeof(fillvalue)), Nrow)
1-element Vector{String}:
#undef
julia> promote_type(eltype(valuecol), typeof(fillvalue))
String
Is there a generic idiom that would (without introducing dependency on CategoricalArrays.jl) allow us to take the union of types and if it is OK (like it should be in the example above) produce CategoricalArray
? Or maybe you think we should change the definition of similar
in CategoricalArrays.jl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I have thought about it. Given we currently allow:
julia> x = categorical([1,2,3])
3-element CategoricalArray{Int64,1,UInt32}:
1
2
3
julia> CategoricalValue(3, x)
CategoricalValue{Int64, UInt32} 3
the behavior we have now should be OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK. If we wanted to change this we could adapt promote_type
in CategoricalArrays, but it's orthogonal to this PR. There would certainly be advantages in promoting to CategoricalValue
, though it could also create problems (e.g. when concatenating Vector{Int} and
CategoricalArray{Union{String, Missing}}the resulting type would be
Any` but CategoricalArrays doesn't handle that well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I think there would be more problems than benefits. We only need an easy way to create a "stand alone" CategoricalValue
, as now it is CategoricalValue(scalar, categorical([scalar]))
which is quite inconvenient.
test/reshape.jl
Outdated
@test dfu.Var2 ≅ [2, 0] | ||
@test typeof(dfu.Var2) <: CategoricalVector{Int} | ||
@test levels(dfu.Var1) == levels(dfu.Var2) == 0:3 | ||
dfu = unstack(df, :variable, :value, fillvalue=CategoricalValue("0", categorical(["0"]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalimilan - this test in particular is something we should think of. My judgement is that it is OK to produce such an union.
src/abstractdataframe/reshape.jl
Outdated
unstacked_val = [fill!(similar(valuecol, | ||
promote_type(eltype(valuecol), typeof(fillvalue)), | ||
Nrow), | ||
fillvalue) for _ in 1:Ncol] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK. If we wanted to change this we could adapt promote_type
in CategoricalArrays, but it's orthogonal to this PR. There would certainly be advantages in promoting to CategoricalValue
, though it could also create problems (e.g. when concatenating Vector{Int} and
CategoricalArray{Union{String, Missing}}the resulting type would be
Any` but CategoricalArrays doesn't handle that well).
NEWS.md
Outdated
@@ -22,6 +22,12 @@ | |||
(notably `PooledArray` and `CategoricalArray`) or when they contained only | |||
integers in a small range. | |||
([#2812](https://github.com/JuliaData/DataFrames.jl/pull/2812)) | |||
* the `unstack` function receives new keyword argument `fillvalue` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should call this just fill
? Or is that name useful in other contexts as a Boolean argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - this name is specific only to this function. I will rename it to fill
.
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
OK - I have renamed |
@nalimilan - any additional comments here? |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Thank you! |
Closes #2698
I thought about checking every column separately for
promote_type
, but it would be a breaking change since right now value column with eltypeT
is always converted toVector{Union{Missing, T}}
. We may think about changing it in another PR.