Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of variable_eltype in stack #3043

Merged
merged 4 commits into from
May 2, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
fix handling of variable_eltype in stack
  • Loading branch information
bkamins committed Apr 28, 2022
commit 8ad809cd81c0d0ad9d3f798daf92a04488b8c546
4 changes: 2 additions & 2 deletions src/abstractdataframe/reshape.jl
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,8 @@ function stack(df::AbstractDataFrame,
# (note that copyto! inserts levels in their order of appearance)
nms = names(df, ints_measure_vars)
simnms = similar(nms, variable_eltype)
catnms = simnms isa Vector ? PooledArray(catnms) : simnms
copyto!(catnms, nms)
copyto!(simnms, nms)
catnms = simnms isa Vector ? PooledArray(simnms) : simnms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call copyto!(simnms, nms) rather than copyto!(catnms, nms) later? Wouldn't the latter be faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without JuliaData/PooledArrays.jl#82 released this will not work. After I make a release of PooledArrays.jl I will make this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed this as indeed it is slightly faster. Here using copyto! should be safe as we know nms cannot have #undef so we are not affected by JuliaLang/julia#45125.

end
return DataFrame(AbstractVector[[repeat(df[!, c], outer=N) for c in ints_id_vars]..., # id_var columns
repeat(catnms, inner=nrow(df)), # variable
Expand Down
14 changes: 14 additions & 0 deletions test/reshape.jl
Original file line number Diff line number Diff line change
Expand Up @@ -845,4 +845,18 @@ end
end
end

@testset "variable_eltype in stack tests" begin
df = DataFrame(A = 1:3, B = [2.0, -1.1, 2.8], C = ["p","q","r"])
bkamins marked this conversation as resolved.
Show resolved Hide resolved
@test_throws MethodError stack(df, :C, variable_name=:D, variable_eltype=Int)
for T in (AbstractString, Any)
sdf = stack(df, [:A, :B], variable_name=:D, variable_eltype=T)
@test sdf == DataFrame(C=["p", "q", "r", "p", "q", "r"],
D=["A", "A", "A", "B", "B", "B"],
value=[1.0, 2.0, 3.0, 2.0, -1.1, 2.8])
bkamins marked this conversation as resolved.
Show resolved Hide resolved
@test sdf.C isa Vector{String}
@test sdf.value isa Vector{Float64}
@test sdf.D isa PooledVector{T}
end
end

end # module