Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOML Backend #1436

Merged
merged 10 commits into from
Aug 17, 2023
Merged

TOML Backend #1436

merged 10 commits into from
Aug 17, 2023

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented May 8, 2023

Extracted from #1277, since that PR contains thematically related, but clearly distinct items.

TODO:

  • Performance testing: TOML Parsing is quite slow in toml11. As this is more for lightweight data with user interaction, this is not a huge problem.
  • Reduce full test suite
  • Docs
  • Merge Better handling for file extensions #1473 first

test/SerialIOTest.cpp Fixed Show fixed Hide fixed
@franzpoeschel
Copy link
Contributor Author

We will not be able to activate TOML for our broad test suite as TOML serialization and deserialization are both quite slow.
For the intended usage, this is not a large problem as TOML is intended for small, handwritten datasets.

The following example is the write_and_read_many_iterations test that writes a file-based Series of 1030 iterations per backend. This first profile (created with google-perftools) is without the TOML backend activated:

prof_before

With TOML backend activated, most time is spent parsing ~1000 TOML files:

prof

src/IO/JSON/JSONIOHandlerImpl.cpp Fixed Show fixed Hide fixed
src/IO/JSON/JSONIOHandlerImpl.cpp Fixed Show fixed Hide fixed
@franzpoeschel franzpoeschel force-pushed the topic-toml-backend branch 2 times, most recently from a001ac8 to bc141aa Compare May 10, 2023 15:59
@franzpoeschel franzpoeschel force-pushed the topic-toml-backend branch 2 times, most recently from 8ef584e to 620cf53 Compare May 24, 2023 14:44
@franzpoeschel franzpoeschel force-pushed the topic-toml-backend branch 3 times, most recently from 922f674 to 3d0c786 Compare July 10, 2023 09:28
@franzpoeschel franzpoeschel mentioned this pull request Jul 10, 2023
4 tasks
@ax3l ax3l self-requested a review August 17, 2023 03:19
@ax3l ax3l self-assigned this Aug 17, 2023
}
#if defined(__INTEL_COMPILER)
/*
* ICPC has trouble with if constexpr, thinking that return statements are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is likely with EDG frontents in general and might also affect nvcc. I remember we saw this there as well and reported it at some point (should be fixed in newer >CUDA 12 versions).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we should change the macro to something more precise?
If yes, do you know what would be the right check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was mumbling here and kept this for future reference, in case we see a warning from it. Nothing to do now.

break;
}
// TOML does not support nulls, so initialize with zero
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting discussion: toml-lang/toml#30

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the central bit of that discussion is "TOML is intended for configuration", which is exactly the use case for which we are adding the TOML backen.
If users insist on writing entire datasets with TOML, that's fine too, but it will be initialized with 0.
Otherwise, TOML is mostly intended for usage in conjunction with the follow-up #1493 which only writes the dataset metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that makes total sense. Should we be a bit more explicit in our guidance in docs/source/backends/json.rst to avoid that users mistake it as a full-fledged, high-performance data backend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add something. Also, in #1493 we could think about enabling the abbreviated modes by default in TOML.

Copy link
Member

@ax3l ax3l Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea about the default with #1493, yes! :)

@ax3l ax3l mentioned this pull request Aug 17, 2023
4 tasks
franzpoeschel and others added 2 commits August 17, 2023 15:42
Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestions on motivation.
Generalizing a bit and clarifying.

docs/source/backends/json.rst Outdated Show resolved Hide resolved
docs/source/backends/json.rst Outdated Show resolved Hide resolved
Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja>
Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you! 🎉

@ax3l ax3l enabled auto-merge (squash) August 17, 2023 15:42
@ax3l ax3l merged commit 9ec90b6 into openPMD:dev Aug 17, 2023
eschnett added a commit to eschnett/openPMD-api that referenced this pull request Sep 5, 2023
* dev:
  Fix CMake: HDF5 Libs are PUBLIC (openPMD#1520)
  Fix `chmod` in `download_samples.sh` (openPMD#1518)
  CI: Old CTest (openPMD#1519)
  Python: Fix ODR Violation (openPMD#1521)
  replace extent in weighting and displacement (openPMD#1510)
  CMake: Warn and Continue on Empty HDF5_VERSION (openPMD#1512)
  Replace openPMD_Datatypes global with function (openPMD#1509)
  Streaming examples: Set WAN as default transport (openPMD#1511)
  TOML Backend (openPMD#1436)
  make it possible to manually set chunks when loading dask arrays (openPMD#1477)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1504)
  Optional debugging output for AbstractIOHandlerImpl::flush() (openPMD#1495)
  Python: 3.8+ (openPMD#1502)

# Conflicts:
#	.github/workflows/linux.yml
#	src/binding/python/Series.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants