Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully support raw vectors #5100

Open
HughParsonage opened this issue Aug 14, 2021 · 4 comments
Open

Fully support raw vectors #5100

HughParsonage opened this issue Aug 14, 2021 · 4 comments

Comments

@HughParsonage
Copy link
Member

Raw vectors can be an attractive type for certain low-cardinality variables because of its low memory footprint. However, using raw vectors in data.table are not fully supported. For example:

DT <- data.table(x = c(1, 2), y = raw(2))
setkey(DT, x)  # ok
setkey(DT, y)  # not supported

 DT <- data.table(x = c(1, 2, 1), y = raw(3))
setkey(DT, x)  # error Item 2 of list is type 'raw' which isn't yet supported (SIZEOF=1)

Is there a reason raw vectors can't be used?

@MichaelChirico
Copy link
Member

#5180 handled the last example.

Note that setkey(DT, y) is also not supported in base R:

order(as.raw(2:1))
# Error in order(as.raw(2:1)) : unimplemented type 'raw' in 'orderVector1'

And this is intentional, per ?Comparison:

Raw vectors should not really be considered to have an order, but the numeric order of the byte representation is used.

So I'm not sure we should break parity with base here without a strong use case in mind.

@HughParsonage
Copy link
Member Author

HughParsonage commented Feb 21, 2024

The problem with not supporting raw vectors is that (a) data.frames (and data.tables) allow raw columns and (b) if you forbid ordering of such columns, you're limited in what you can do with the data frame. I think I understand the rationale behind that note in Comparison, but I don't support its strict enforcement (and note that raw vector can be compared using < so even base R does not view it strictly). I think it's reasonable that x < y realizes an ordering.

Moreover, looking at the OP, even if we accept that raw vectors should not have an order, there's no reason why DT[, .N, by = y] shouldn't work.

@MichaelChirico
Copy link
Member

you're limited in what you can do with the data frame

yes, but that's also true of list/expression columns. raw columns might be some intermediate state of an analysis, with the columns getting converted to allow further processing later, for example. again it would be useful to have some concrete use cases in mind as we did for complex support.

there's no reason why DT[, .N, by = y] shouldn't work.

same again as list columns, but we only support grouping by orderable column types as of now (as opposed to unique-able column types). it would be a pretty substantial effort to refactor+abstract the code base to allow a mix of the two.

@HughParsonage
Copy link
Member Author

HughParsonage commented Feb 21, 2024

I guess my counter to that point is that list/expression columns should in principle be supported too, but are difficult to implement. Raw vectors are by contrast easy to implement. I don't accept that raw vectors are not orderable -- that order doesn't work with them is an idiosyncratic choice by R -- since x < y is well-defined for raw x and y.

R is not really consistent on this point and I don't think we do much for our sanity in trying to adhere to it. For example order works on factor columns but < doesn't. In my view, order is less appropriate for factors than for raw vectors, especially as there is a separate class for ordered factors. Yes, raw vectors may be used in contexts where the order is not meaningful, but that is true for all of the atomic classes: it's not uncommon for data sets to use integer vectors to represent categorical variables without any order.

Note we support integer64 ordering and that is much more demanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants