Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NROW as a way to access the nrow after subsetting in i but before entering a subgroup #4688

Closed
zerweck opened this issue Aug 28, 2020 · 3 comments

Comments

@zerweck
Copy link

zerweck commented Aug 28, 2020

I use data.table a lot to calculate absolute frequencies on the fly with, while also applying a subsetting flag in i in the same call.

iris[Sepal.Width<3.7,.N,.(Species)]

If i want to change this to relative frequencies within the subset, I need to save the number of rows after filtering as denominator in another call, like this:

nrow_after_filtering <- iris[Sepal.Width<3.7,.N]
iris[Sepal.Width<3.7,.(rel_freq=.N/nrow_after_filtering) ,.(Species)]

This gets old after a while when doing this a lot.
It would be great if a symbol .NROW similar to .N existed, returning the nrow of the data.table after applying i, but before going into the subgroup.

The name .NROW is probably bad since it would be unclear what the difference to .N is, maybe .NI or .NFILT?

@zerweck zerweck changed the title .NROW as a way to access the nrow after subsetting in i but before entering a subgroup .NROW as a way to access the nrow after subsetting in i but before entering a subgroup Aug 28, 2020
@ben-schwen
Copy link
Member

Your intended behaviour can simply be achieved by chaining
iris[Sepal.Width<3.7, .N, .(Species)][, rel_freq := N/sum(N)][]

@ColeMiller1
Copy link
Contributor

Related: #1206

I like the concept but I always think of @MichaelChirico's comment from the thread as I sometimes wonder if there are too many symbols already, some of which are only triggered via NSE.

Not sure when symbol overload kicks in...

Finally, here was my code golf.

library(data.table)
as.data.table(iris)[Sepal.Width < 3.7,
                    {
                      N = .N
                      .SD[, .(rel_freq = .N / N), Species]
                    }
]

@zerweck
Copy link
Author

zerweck commented Sep 1, 2020

Related: #1206

As usual when I post something here, a better issue exists already that I somehow didn't find when searching :)

Also thanks to both of you @ben-schwen and @ColeMiller1 for the suggestions, I have used chaining in the past but somehow forgot about it again.

@zerweck zerweck closed this as completed Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants