Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecordComponent: Properly handle uninitialized datasets #1316

Merged

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Oct 4, 2022

We currently have a non-functional check during flush() that a dataset has been initialized.
The check tests that the Datatype is not UNDEFINED, but it is never undefined since the constructor initializes it as CHAR.

We can use a std::optional to have a trustworthy check. This gives users a better feedback if they forget this call.

TODO

  • Merge Add Attribute::getOptional<T>() and use to add some more dynamic datatype conversions at read time #1278 first
  • The newly-added check uncovers that the CoreTests are mostly wrong, since they don't initialize the dataset. Since the CoreTests usually don't need to actually interact with the resulting dataset, they still run fine, but show a warning in the destructor. Maybe fix them.
    [~Series] An error occurred: Wrong API usage: [RecordComponent] Must specify dataset type and extent before flushing (see RecordComponent::resetDataset()).
    [~Series] An error occurred: Wrong API usage: [RecordComponent] Must specify dataset type and extent before flushing (see RecordComponent::resetDataset()).
    [~Series] An error occurred: Wrong API usage: [RecordComponent] Must specify dataset type and extent before flushing (see RecordComponent::resetDataset()).
    [~Series] An error occurred: Wrong API usage: [RecordComponent] Must specify dataset type and extent before flushing (see RecordComponent::resetDataset()).
    [~Series] An error occurred: Wrong API usage: [RecordComponent] Must specify dataset type and extent before flushing (see RecordComponent::resetDataset()).
    [~Series] An error occurred: Wrong API usage: [RecordComponent] Must specify dataset type and extent before flushing (see RecordComponent::resetDataset()).
    
  • Maybe turn this into a warning, and default to {CHAR, {1}} instead
  • Wait for release 0.15 before merging this. This might uncover bugs in downstream code, and they should have time to fix it.
  • Still problems at myPath and structure_test of CoreTests
  • Maybe check if the JSONIOHandlerImpl fix should be applied to other backends, too: The other backends only free their resources in the destructor, but don't flush

Notes:

  • This is also useful for things like [WIP] Use JSON/TOML template for defining openPMD metadata in a config file #1277, since that PR actually gives a dataset with UNDEFINED datatype a meaning (inside TOML templates). Distinguishing between "forgotten to specify" and "explicitly specified as UNDEFINED" is useful here.
    This is an argument in favor of not defaulting to {CHAR, {1}}
  • This seems to uncover some such bugs in our own test suite.

@franzpoeschel franzpoeschel force-pushed the fix-recordcomponent-dataset-uninitialized branch from 69d0f29 to 4f031f2 Compare October 12, 2022 09:32
@franzpoeschel franzpoeschel force-pushed the fix-recordcomponent-dataset-uninitialized branch 2 times, most recently from a85502d to ad7dad0 Compare October 26, 2022 12:36
@franzpoeschel franzpoeschel force-pushed the fix-recordcomponent-dataset-uninitialized branch from ad7dad0 to f179f56 Compare November 2, 2022 13:02
@franzpoeschel franzpoeschel force-pushed the fix-recordcomponent-dataset-uninitialized branch from f179f56 to 4dfaed6 Compare November 17, 2022 13:35
@franzpoeschel franzpoeschel force-pushed the fix-recordcomponent-dataset-uninitialized branch 2 times, most recently from ca88331 to 542069e Compare January 4, 2023 15:59
@franzpoeschel franzpoeschel force-pushed the fix-recordcomponent-dataset-uninitialized branch from 542069e to 5bb1876 Compare January 16, 2023 10:18
@franzpoeschel franzpoeschel force-pushed the fix-recordcomponent-dataset-uninitialized branch 2 times, most recently from 3d39aa4 to f031bf8 Compare February 23, 2023 09:46
@franzpoeschel franzpoeschel added this to the 0.16.0 milestone Mar 2, 2023
@franzpoeschel franzpoeschel mentioned this pull request Mar 2, 2023
6 tasks
@ax3l ax3l self-requested a review March 7, 2023 19:03
@ax3l ax3l self-assigned this Mar 7, 2023
@franzpoeschel franzpoeschel force-pushed the fix-recordcomponent-dataset-uninitialized branch from f959cd4 to f5033bd Compare March 10, 2023 14:44
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Mar 20, 2023
Todo: Merge functionality with openPMD#1316
@franzpoeschel franzpoeschel force-pushed the fix-recordcomponent-dataset-uninitialized branch from f5033bd to bb29a44 Compare March 24, 2023 11:29
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Mar 24, 2023
Todo: Merge functionality with openPMD#1316
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Mar 24, 2023
Todo: Merge functionality with openPMD#1316
@franzpoeschel franzpoeschel force-pushed the fix-recordcomponent-dataset-uninitialized branch from bb29a44 to 7909bab Compare March 30, 2023 11:16
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Mar 30, 2023
Todo: Merge functionality with openPMD#1316
@ax3l
Copy link
Member

ax3l commented Apr 10, 2023

This looks good! Anything on the open checks in the PR description we shall discuss tomorrow? :)

@franzpoeschel franzpoeschel force-pushed the fix-recordcomponent-dataset-uninitialized branch 2 times, most recently from cba1825 to a0fbc49 Compare April 11, 2023 10:31
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Apr 11, 2023
Todo: Merge functionality with openPMD#1316
@franzpoeschel
Copy link
Contributor Author

This looks good! Anything on the open checks in the PR description we shall discuss tomorrow? :)

I've pushed a little fix that makes this whole PR a lot less invasive by just skipping any datasets, that have neither a dataset definition nor written chunks. That makes a flush directly after first access possible, in my opinion eliminating all remaining questions.

Fix the examples, the unfinished_iteratoin_test and the Coretests
It's not needed, and it avoids weird situations while recovering from an
error.
Just ignore and skip the component in that case
@franzpoeschel franzpoeschel force-pushed the fix-recordcomponent-dataset-uninitialized branch from a0fbc49 to c38c82a Compare April 12, 2023 09:34
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Apr 12, 2023
Todo: Merge functionality with openPMD#1316
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Apr 13, 2023
Todo: Merge functionality with openPMD#1316
Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ax3l ax3l added the internal label Apr 13, 2023
@ax3l ax3l merged commit 413b1c5 into openPMD:dev Apr 13, 2023
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Apr 14, 2023
Todo: Merge functionality with openPMD#1316
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Apr 27, 2023
Todo: Merge functionality with openPMD#1316
eschnett added a commit to eschnett/openPMD-api that referenced this pull request May 19, 2023
* dev: (82 commits)
  Docs: Linking to C++ Projects (openPMD#1445)
  CI: macOS-11 Update (openPMD#1446)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1437)
  Fix deprecated storeChunk APIs in first read/write examples (openPMD#1435)
  Update .readthedocs.yml (openPMD#1438)
  Doc: Fix Bib Authors (openPMD#1434)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1425)
  More careful documentation of streaming API (openPMD#1430)
  Fix gcc9 warning (openPMD#1429)
  Python bindings: Release GIL during IO wait operations (openPMD#1381)
  RecordComponent: Properly handle uninitialized datasets (openPMD#1316)
  Remove ADIOS1 - Long Live ADIOS2 (openPMD#1419)
  Post 0.15.0 Changelog Template (openPMD#1420)
  GitHub Actions: macOS has 3 Cores (openPMD#1421)
  `version.hpp`: 0.15.1 (openPMD#1417)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1416)
  Release 0.15.1 (openPMD#1414)
  Doc: More HTML Updates (CSS) (openPMD#1413)
  Enable clang-format also for .tpp files by using a regex instead of a predefined filter (openPMD#1403)
  Docs: Update Funding (openPMD#1412)
  ...

# Conflicts:
#	.github/workflows/linux.yml
#	CMakeLists.txt
@ax3l ax3l added the tests label Aug 18, 2023
ax3l added a commit to ax3l/openPMD-api that referenced this pull request Aug 18, 2023
ax3l added a commit that referenced this pull request Aug 19, 2023
* Fix gcc9 warning

the implicitly-defined constructor does not initialize 'openPMD::Datatype openPMD::detail::BufferedUniquePtrPut::dtype'

* More careful documentation of streaming API

* Doc: Fix Bib Authors

Make sure the bib authors match the quoted openPMD-standard authors.

* Update .readthedocs.yml

Update to newer Ubuntu, shipping a newer OpenSSL

* Fix deprecated storeChunk APIs in first read/write examples

* CI: macOS-11 Update

The older macOS image is now removed. The latest points already
to macOS-12. macoS-13 runners are marked experimental.

https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

* Docs: Linking to C++ Projects

Move a section only written in the README to our
developer section on readthedocs.

* Use lazy imports for dask and pandas

* Better error message when loading to a buffer with mismatched type

* Remove schema 2021 from documentation

It will be removed and should no longer be advertised.

* Document ROMIO/HDF5/Chunking issue

open-mpi/ompi#7795

* Use a "minor" RST link

Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja>

* HDF5: HDF5_DO_MPI_FILE_SYNC

Document a new work-around option for MPI-parallel HDF5 for
filesystems that are super limited for paralle I/O features
relevant in HPC.

* Fix typo

Co-authored-by: Franz Pöschel <franz.poeschel@gmail.com>

* openpmd-pipe: set correct install permissions

* Fix headers in workflow.rst

* Fix documentation for preferred_flush_target in ADIOS2

* HDF5: Fix Char Type Matching

In HDF5, there are only the signed and unsigned char type.
The third `char` type is an alias to one or the other, different
to the C/C++ fundamental types.

This tries to fix the type matching order to be platform independent
between ppc64le/aarch64/x86-64.

* Add failing HDF5 test

* loadChunk(): consider equivalent char types for casting

* Doc strings

* newline & comment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Doc: Sphinx Copybutton and Design

Add the Sphinx extensions `copybutton` (copy code blocks with
a single click) and `design` (for boxes, tabs, dropdowns, etc.).

* CI: Doxygen 1.9.7 Broken

Markdown support for main file broken in 1.9.7.
Go back to previous patch release. Upstream already fixed.

* Docs: Analysis

Add a data analysis & visualization section.
This is meant to show entry points and workflows to work with
openPMD data in larger frameworks and compatible ecosystems.

* [Draft] DASK, Pandas, ...

* Doc: DASK

* Pandas

* RAPIDS

* Typos

* Don't require unitSI when reading patch record component

* CI: oneAPI 2023.2.0

Update CI to breaking `apt` changes in the latest oneAPI release.

* Update __repr__ method of major objects in openPMD hierarchy

* Deal with trailing slashes in file paths

Happens in bash completion on BP files since they are folders.

* Throw better error messages when selecting a bad backend

* Adapt CoreTest to new error message

* Follow-up to #1470

Changes there left the PatchRecordComponent dirty after parsing

* HDF5IOHandler: Support for float128 data types with 80bit precision on ARM64/PPC64

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Unify little/big endian, use double80 also in other places

Long double size 8 -> 16

Use malloc to avoid alignment issues

Same treatment for complex long double

Add this for readAttribute too

Avoid non-native datatypes in writing

* Make this fix little-endian only

* Add comment

* Suggestions from review

* Use new instead of malloc everywhere

Also, add a more explanative comment

* CI fix: type comparison in Python

No idea why this check is triggered by this PR

* ADIOS2: Ensure that a step is always active at write time

Even when not using steps, dump everything into a single large step. BP5
will fail otherwise.

* CI: macOS 11.0+

We do not have the runners anymore to deploy and test macOS 10.15,
so we bump our tests to 11.0+, too.

Technically, we could build & deploy a bit longer on GH actions
`macos-11` for `10.15` (Catalina), but since it is unmaintained by
Apple/end-of-life already and macOS-11 is the oldest still maintained
OS by Apple, we do also stop support for it.

At the time of writing, old and maintained are:
- macOS 11, Big Sur
- macOS 12, Monterey

latest is: macOS 13, Ventura - and in preview is macOS 14, Sonoma.

This also unifies our arm64/aarch64 (M1/M2) requirements, which are
macOS 11.0+ as well.

* HDF5: Throw ReadError at dataset open

* Try parent types when a dataset datatype is unknown

* Update Sample Download Scripts

* Add test for old HDF5-plugin written openPMD dataset from PIConGPU

* Remove debugging comments

* Warn on BP5+Blosc in ADIOS2 v2.9 up to patch level 1

* Fix unused parameter

* Update Warning String

* Throw error when steps are needed but not supported

* Fix variableBasedSingleIteration test

* Test that the error is correctly thrown

* Introduce Series::parseBase alias for readIterations(), fix workflow

* Have only one instance of SeriesIterator

* Break memory cycle

* Use simple API in test again

* Snapshot attribute in file-based encoding

Snapshot attribute must be written in Iteration::endStep() in file-based
encoding

* Optional debugging output for AbstractIOHandlerImpl::flush()

* Revert "Optional debugging output for AbstractIOHandlerImpl::flush()"

This reverts commit ee8de45.

* Post-rebase fixes

1) Don't write snapshot attributes during initialization of an iteration
2) Catch unsuccessful flush run also in beginStep()

* Ensure that m_lastFlushSuccessful is always called

* Pandas DataFrames: Add Row Column Name

By default, the row index (!= particle index) in a pandas
dataframe has no name. This can be a bit cumbersome for
exports, e.g., to CSV - where this header field would just
be empty.

This PR names the index now "row", because it is not a
(macro) particle id property.

* Python: 3.8+

Python 3.7 went EOL last month. This bumps our supported
versions to 3.8+.

* Optional debugging output for AbstractIOHandlerImpl::flush()

* Add an environment variable for this

* Version 0.15.2 + Changelog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Python: Fix Series Compile

Fix compile issue introduced during backporting.

* Unmerge 0.16 content from variableBasedSeries test

* Reintroduce backend selection in variableBasedSeries test

Still needed due to ADIOS1 BP env variable

* Add src/Dataset.cpp to ADIOS1 source

* Replace openPMD_Datatypes global with function

* Windows CI: Bump Version to 0.15.2

* OPENPMD_BP_BACKEND=="ADIOS1": Skip BP5 Tests

* Streaming examples: Set WAN as default transport

* Exclude ADIOS1 from variableBasedSeries test

* Changelog: Bump Date

* replace extent in weighting and displacement

store extent value in n_particles

* Examples: Fix Types of Constants & Attributes

Backports from #1316 and #1510

* Changelog: Streaming Example Note

* CMake: Warn and Continue on Empty HDF5_VERSION

Seen on Conda-Forge for arm64 on macOS for HDF5 1.14.1

---------

Co-authored-by: Franz Pöschel <franz.poeschel@gmail.com>
Co-authored-by: Dave Grote <grote1@llnl.gov>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ulrik Guenther <hello@ulrik.is>
Co-authored-by: Kara-Mostefa Ilian <ilian.kara-mostefa@student-cs.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants