Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_rdataframe doesn't support Long64_t #2018

Closed
agoose77 opened this issue Dec 19, 2022 · 2 comments · Fixed by #2023
Closed

from_rdataframe doesn't support Long64_t #2018

agoose77 opened this issue Dec 19, 2022 · 2 comments · Fixed by #2023
Assignees
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged

Comments

@agoose77
Copy link
Collaborator

agoose77 commented Dec 19, 2022

Version of Awkward Array

main

Description and code to reproduce

@miranov25 noticed this:

>>> ROOT.awkward.type_to_form["Long64_t"](0)
Type x is not recognized.

We obviously could support this in the type_to_name function, but I am unsure as to whether we want to handle these kinds of type aliases, which won't round-trip (only really important for flatlist_as_rvec, if at all).

If we decide that we don't support these aliases, then we should probably update the documentation :)

Incidentally, there is ROOT::Internal::RDF::TypeName2TypeID which behaves like our type_to_name. Do we not use this because it's internal?

@agoose77 agoose77 added the bug The problem described is something that must be fixed label Dec 19, 2022
@agoose77 agoose77 added bug (unverified) The problem described would be a bug, but needs to be triaged and removed bug The problem described is something that must be fixed labels Dec 19, 2022
@jpivarski
Copy link
Member

The right thing to do is just to add it to the type_to_name function. It's not a list that's going to grow indefinitely—Long64_t is an oddball because it's covering a gap from the transition from 32-bit architectures to 64-bit architectures (so now, it's a relic). Uproot had to deal with it, too. But there won't be a continuing series of additional type names to special-case like this.

Until we all move to 128-bit architectures, that is. (Not as far-fetched as it sounds: it's not for the size of the memory space; there are proposals to use the extra 64 bits for security.)

@jpivarski
Copy link
Member

Also, we can't make any kind of promises about round-tripping, non-destructively, from Awkward to RDataFrame and back (or from RDataFrame to Awkward and back). The two systems live in very different type-systems: the Awkward/Arrow set and all of C++. Any conversions of values from one type-system into the other should be thought of as a projection.

So Long_t and Long64_t both become np.int64 on a 64-bit system, because this is what they both mean. Even C++ compilers don't always maintain a distinction between typedef'ed names; if I want a C/C++ type check to know that A and B are different, I can't rely on

typedef A B;

It is necessary to do something like

struct B { A _; };

and unpack B._.

I did a little digging: here and here: Long_t maps to the C/C++ long type and Long64_t is a "Portable signed long integer 8 bytes," guaranteeing that the integer will be written into files as 64-bit, without relying on long being 64-bit on your system. But as long as we're talking about 64-bit systems, those two types are typedef'ed together.

@ianna ianna linked a pull request Dec 20, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants