-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: EDM4HEPSchema and Newstyle FCCSchema #1245
Conversation
can you rebase all the commits into 1 so that that 12MB file isn't into the repo history at all? Thanks! |
9934f04
to
eeeeef5
Compare
Done |
@prayagyadav is there anything more you want to do on this PR? |
yeah, gotta sort out some features which I missed. Also, I have to add more comments ... |
OK - ping me when you are done! It's otherwise looking good so far. |
…adable with the EDM4HEPSchema_v00_10_05
@prayagyadav you should be able to determine the C++ just from the file metadata right? If so if you dig through uproot a bit you can extract it without needing to resort to materializing an array. Otherwise let's wait for that PR to go through and an uproot release. Also @nsmith- had some interesting things to say about dealing with streamers when we were talking yesterday. May help this out. |
@lgray Using this script that I found, I can list out all the relevant metadata for a given EDM4HEP based file. I found out that some generations of FCC samples have the typename info in the Unfortunately, the older generations like the Spring2021 campaign do not have the typename info in metadata. Another issue is that the collection typenames are incorrect! Maybe it's just the problem with this script, but I am not sure. In the long term, it would be very beneficial to get access to the Collection IDs which are stored in the metadata. Is there a way to access the |
Regarding the format of this metadata: I would like to point out that they are technically not part of the guarantees we give for stability as we do in other places, see also the documentation, or a concrete example of how we want to change this AIDASoft/podio#711 If you want to have the ground truth of the generation you can get to the actual full definition of the datamodel as JSON encoded string from
Maybe this is because the script gives you the names of the classes in the user-layer (see documentation), while what actually lands in the files comes from the POD layer. So if you see |
@tmadlener - instead the output of the script seem shuffled? I guess |
Ah right, didn't catch that, sorry. From a quick look, I think this So Looking at |
…vents_physlite.py
for more information, see https://pre-commit.ci
It seems with the latest uproot we're working rather well? |
Yeah. I think I have everything in place: the improved tests, the docstrings and the comments. Seems ready for a merge. |
how large are the yaml files that you've included in |
Ah - only seems to be 10s of kb. Not worth compressing. |
for more information, see https://pre-commit.ci
@lgray Hi, Do you have an update for the PR? The current status of the PR looks good to me. Let me know if some section of the code needs improvement. Thanks |
Hi @prayagyadav I've been taking care of issues elsewhere in the ecosystem, sorry! I'll try to review your PR today or tomorrow. A quick look seems to be in decent shape, but let me see. |
Code itself looks in good shape, is commented well as to function when it is unclear and is well tested. Thanks for the contribution! |
@lgray @davidlange6 @gomber
Here is a clean draft for the EDM4HEPSchema (edm4hep1) and FCCSchema based on the same. I have not yet managed to add many comments and descriptions, but I plan to add them eventually.
Workings:
The EDM4HEPSchema reads the
edm4hep.yaml
file from the assets directory. I felt this was necessary to add maximum functionality to the schema. Reading the specifications of all the 'components' and 'datatypes' from the yaml file helps to identify the 'members' (example,energy
is a member ofedm4hep::ReconstructedParticle
datatype) and which members correspond to the various types of cross-branch relations in EDM4HEP: vector members, OneToOneRelations, OneToManyRelations and Links.The Schema fetches the comments in the
edm4hep.yaml
file and assigns them as docstrings to the relevant branches.The EDM4HEPSchema supports all these relations (With Links needing some manual boilerplate code from the user).
The version of the
edm4hep.yaml
file used is here. Please note that the way Links are represented in EDM4HEP has changed in the latest commit. @tmadlener can comment more on this. In any case, it seems necessary to find a way to track the changes from edm4hep.yaml, so that the COFFEA EDM4HEPSchema does not become obsolete after a few version changes.Link to example Notebooks:
Tests:
Other comments:
ExtraCode
sections mentioned inedm4hep.yaml
. They appear to be declarations for C++ methods specific to certain collections.