-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using a function inside by
causes troubles
#5583
Comments
Thank you for the report. Is it reproducible on master branch? 1.14.6 is far behind master. |
Can reproduce on current master and is similar to #5361. At least the experienced problem is the same
As already pointed out by @AbrJA, using |
As a temporal solution (if someone is facing the same issue), using "keyby" instead of "by" solves the problem
|
OK I think the key is the function has to return a strictly decreasing ordering of the key column -- not just any function in DT=data.table(a=-1:2, key='a')
key(DT[,.N, by=.(a2=a**2)])
# NULL here That would mean this line is to blame, since a strictly decreasing input to Line 1538 in 70c64ac
forderv(list(b=2:1), sort=FALSE, retGrp=TRUE)
# integer(0)
# attr(,"starts")
# [1] 1 2
# ... other attr...
|
Some related faulty logic determining the key of the result: DT=data.table(a=1:3, b=3:1, key='a,b')
key(DT[,.N, by=.(a^b, rep(1L, 3L))])
# [1] "a" "rep" |
I'm facing an issue when I summarize a data.table using a function inside the "by" clause
Here is an example:
I think the issue is here - attr(*, "sorted")= chr "round" because dt_sum isn't already sorted. I don't know if it's a known issue and there's documentation about it, I didn't find anything. Gretings!
The text was updated successfully, but these errors were encountered: