-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand allow.cartesian error hints for non-equi joins #2086
Comments
Fetching indices for non-equi joins is slightly more complicated and requires (at times) a reordering, which requires populating them entirely first. Even though it is rare, there might be cases where this triggers the |
I'm trying to non-equi join two really big tables to count sizes of groups. Cartesian join exceeds the 2^31 limit. So the only possibility for me is to group by .EACHI. Unfortunately, it fails with the mentioned error. It's a critical bug for me. Please, consider fixing it. Here is a tiny example to reproduce the error:
|
@fedyakov as temporary fix just add dt1[dt2, .N, on = .(a1 < a2, b1 < b2), by = .EACHI, allow.cartesian = TRUE]
# a1 b1 N
# 1: 5 3 4
# 2: 6 3 4 Unless you've tried that and it fails? |
I don't have the reproducible example now, but as far as I remember it did not work. Cartesian join exceeds the 2^31 limit. |
Closed by #4493. Related to #4489. 1.12.9 will make @fedyakov, there's also #3009 for joins that exceed 2^31 but are aggregated for .N. Then separately, there is #3957 which is explicitly about |
@ColeMiller1 My case is about .EACHI aggregation primarily. It doesn't require 64-bit vectors at all since aggregation could be done alongside joining in a single stage and the result fits in а 32-bit vector. Could you consider a separate issue to fix this special case? |
Taken from a good question on SO...
The goal is to count, for every city and year in the data, how many firms are active (have YearFrom <= Year < quasiYearTo). This code works:
However, if I drop allow.cartesian, I get the usual error message:
So my
i
has no duplicates:...and I am using
by=.EACHI
already. It would be nice if the warning message said gave a more relevant hint here (though I don't know what that might be, since I'm not actually clear on why I need allow.cartesian on here). Anyway, a minor suggestion.An alternative would be to turn off the cartesian check (or downgrade it from error to verbose message) if it's a non-equi join and
by=.EACHI
, since that case seems pretty safe.The text was updated successfully, but these errors were encountered: