Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iteration::close #746

Merged
merged 37 commits into from
Jul 27, 2020
Merged

Iteration::close #746

merged 37 commits into from
Jul 27, 2020

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented May 26, 2020

While the openPMD API currently has a data layout (file-based layout) that creates new files for each iteration of data, no explicit method to close such a file currently exists. This PR adds a call Iteration::close.

TODO:

  • Implement for backends other than ADIOS2
  • Iteration::close is deferred, like essentially all other calls in the API so far, the file will only be closed upon flushing. 1. I need to document that fact better, 2. should we keep it that way? Maybe something like close(bool flush = true) could be helpful?

@ax3l ax3l added api: new additions to the API frontend: C++17 labels May 26, 2020
@ax3l
Copy link
Member

ax3l commented May 26, 2020

Thanks a lot for the PR! I think going for now with close(bool flush = true) would be good! Maybe we get #484 at some point for more global control...

include/openPMD/IO/IOTask.hpp Outdated Show resolved Hide resolved
src/Iteration.cpp Outdated Show resolved Hide resolved
@franzpoeschel franzpoeschel force-pushed the topic-close-iteration branch from 697b8a1 to 0b2a74c Compare June 2, 2020 18:12
@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Jun 3, 2020

Does ADIOS1 have any issues with having multiple instances (i.e. a writer and a reader) interfere with one another? See the comment in the latest commit.

@franzpoeschel franzpoeschel force-pushed the topic-close-iteration branch from 509af6e to 29e5251 Compare June 3, 2020 22:27
@franzpoeschel
Copy link
Contributor Author

HDF5 seems to be working, ADIOS1 is still being a bit moody atm..

@ax3l
Copy link
Member

ax3l commented Jun 8, 2020

Could be a double free.

@ax3l ax3l requested a review from guj June 8, 2020 07:05
/*
* This block will run fine if commented in
* But the following block will fail with a segfault
* upon adios_select_method ????
Copy link
Member

@ax3l ax3l Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's because you cannot initialize ADIOS1 twice in the same process.
ADIOS1 uses some nasty static states.

So: no two series that are active at the same time in a process and use each an ADIOS1 backend.
See: https://openpmd-api.readthedocs.io/en/latest/backends/adios1.html#limitations

Copy link
Contributor Author

@franzpoeschel franzpoeschel Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That explains things, was already suspecting something in this direction. So, the basic "close file" functionality should be working anyway for ADIOS1, but we should probably just run a simplified test, such as checking whether the file is on disk?
Alternatively, running an MPI test with two processes could circumvent that, but I'm feeling that would be overkill...

Copy link
Member

@ax3l ax3l Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, for ADIOS1 we can use our helper methods to check we found the files on disk from a single rank instead of reading them back early. Let's keep the logic for the other backends as is, it looks good to me.

@franzpoeschel
Copy link
Contributor Author

Writing attributes to indicate files having been closed has been problematic for read mode. I used a boolean flag on Iteration now to circumvent that.
I left the attribute in, since it will be necessary for streaming mode later on anyway.
Also fixed the ADIOS1 test.

@franzpoeschel franzpoeschel force-pushed the topic-close-iteration branch from bba51a7 to a45a155 Compare June 15, 2020 22:42
* Indicates that an iteration has been logically closed.
* Will be physically closed upon next flush.
*/
std::shared_ptr< bool > m_closed = std::make_shared< bool >( false );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you quickly remind why this cannot be a member? Is some lifetime of the series/iteration preventing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class Iteration is designed as a handle, all relevant information should be shared between copies.

auto iteration0 = series.iterations[0];
iteration0.close();

If the flag were not behind a shared_ptr, we would then get:

iteration0.m_closed == true
// but
series.iterations[0].m_closed == false

(This touches a broader point that we should keep in mind, if we renovate the frontend some day: While most classes in the openPMD API are handles, this is implemented rather in an ad-hoc way currently. I would prefer a design similar to ADIOS2, i.e. have some none-copyable resource like internal::Iteration and a user-facing handle external::Iteration that acts like a shared_ptr to internal::Iteration)

* to avoid reading undefined data if a backend implements optimizations
* based on this information.
*/
std::shared_ptr< bool > skipFlush = std::make_shared< bool >( false );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need both m_closed and [m_]skipFlush?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation here is outdated, will fix – also maybe the naming should be redone, something like logicallyClosed and physicallyClosed. logicallyClosed means that the iteration has been closed, but the corresponding flush has not run yet.

Copy link
Member

@ax3l ax3l Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I would just call these states m_closed and m_flushed or so. (we have no control when something really is done transferring, that is up to the backend we talk to.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think m_flushed does not sound very clear about what it's doing?
I mean, while you can certainly argue that m_physicallyClosed is not technically correct, it is "physically closed" from the frontend's perspective. Anything from now on is the backend's problem which the frontend needs not and cannot interfere with any longer. I'd say that's close enough? :D

Alternatively, what about m_closedInFrontend and m_closedInBackend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this thing into an enumeration that can take the values Open, ClosedInFrontend and ClosedInBackend now.

@@ -564,6 +564,10 @@ Series::flushFileBased()
bool allDirty = dirty;
for( auto& i : iterations )
{
if ( *i.second.skipFlush )
{
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we throw on this in case a users continues to issue API calls on an (accidentally) closed iteration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flushing procedures haven't been written with the possibility in mind that an iteration might not be present any longer. Flushing an iteration will e.g. always open the corresponding file, irregardless of whether it is still there or not.

Copy link
Member

@ax3l ax3l Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the suggested renaming above will already help a bit. Anyway, this looks like something that will be very hard to debug.

Maybe I missed this: When would a closed and flushed iteration be flushed again? When we change global parameters of an overall series such as extension at a late point, maybe? Sounds legit to ignore that and we should maybe write some warnings to stderr when this happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would a closed and flushed iteration be flushed again?

The idea behind continuing with the next iteration in that case is exactly to not flush it again. I've added sanity checks now to check whether the iteration hasnt been wrongly accessed and we can safely skip it.

@franzpoeschel franzpoeschel force-pushed the topic-close-iteration branch from c17266d to 0c91c36 Compare June 22, 2020 04:50
@franzpoeschel franzpoeschel force-pushed the topic-close-iteration branch 2 times, most recently from 9bfad70 to 696803a Compare June 24, 2020 06:19
@ax3l
Copy link
Member

ax3l commented Jul 9, 2020

@franzpoeschel please rebase against latest dev (or merge it in) for latest CI updates.

@ax3l
Copy link
Member

ax3l commented Jul 9, 2020

Ah wait, I actually see a memory leak is caught. Is that from ADIOS2? In case it's not wrong usage on our side that causes this we can report it upstream and add a temporary ignore in .github/ci/sanitizer/clang/Leak.supp

@franzpoeschel
Copy link
Contributor Author

I intentionally leak memory in the serial IO tests to avoid throwing in a destructor 1, 2.

Looking at #709 again, I should check again my implementation to see whether there is actually a throwing destructor somewhere (the issue which I fixed in here only came up further down the line in the streaming branch, so we might be fine just letting the destructor run normally). Will look into it tomorrow.

This is currently not necessary, since closing an iteration in
group-based mode is a no-op. It will become relevant in streaming-based
mode, where closing an iteration means discarding its corresponding data
packet.
1) Remove debugging output
2) Delete buffered attribute writes after performing them
@franzpoeschel franzpoeschel force-pushed the topic-close-iteration branch from 773578a to c320171 Compare July 10, 2020 10:07
@franzpoeschel
Copy link
Contributor Author

I don't know exactly where you saw the memory leak – is it gone now? I reverted the leak in the tests now.

@franzpoeschel franzpoeschel force-pushed the topic-close-iteration branch from b27648c to 9a702f2 Compare July 21, 2020 11:23
* Background: Upon calling Iteration::close(), the openPMD API
* will add metadata to the iteration in form of an attribute,
* indicating that the iteration has indeed been closed.
* Useful mainly in streaming context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the description alone I do not understand the difference between closed() and closedByWriter().

Do you need to add an additional sentence that states that the reader will probe this in a streaming context?

Suggested change
* Useful mainly in streaming context.
* Useful mainly in streaming context when a reader
* inquires from a writer that it is done writing.

include/openPMD/Iteration.hpp Show resolved Hide resolved
* @return false Otherwise.
*/
bool
verifyClosed() const;
Copy link
Member

@ax3l ax3l Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to closeViolated() or stillClosed() or similar, verifyClosed() is not clear. What does this return if an iteration was never closed? The name would also depend on this.

@@ -182,6 +182,14 @@ class Mesh : public BaseRecord< MeshRecordComponent >

void flush_impl(std::string const&) override;
void read() override;

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing a doxygen string without an element that captures it will propagate the block to the next location where it can "dock". This will lead to undesirable results.

Suggested change
/**
/*

*
* @return true If closed iteration had no wrong accesses.
* @return false Otherwise.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*/
* @todo needs to be implemented
*/

Copy link
Member

@ax3l ax3l Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intentional that particles already have the method but meshes, don't? I think this is missing a declaration here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the method on the base class instead (i.e. BaseRecord) since it also occurs somewhere in the hierarchy for particles. This is dead code, I removed it now.

Comment on lines 416 to 417
/*
* std::unordered_map::erase:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/*
* std::unordered_map::erase:
/* std::unordered_map::erase:

auto fileID_it = m_fileIDs.find( writable );
if( fileID_it == m_fileIDs.end() )
{
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that warn? Did you already see that happen? I fear such things will make other problems hard to debug, so just double-checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can let that throw instead. Closing a file should require that the file is present in the first place.?

@@ -19,6 +19,9 @@
* If not, see <http://www.gnu.org/licenses/>.
*/
#include "openPMD/Iteration.hpp"

#include <tuple>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please include std libs last

*m_closed = CloseStatus::ClosedInFrontend;
if( _flush )
{
Series * s = dynamic_cast< Series * >(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please validate the dynamic_cast is not returning a nullptr: #745

test/SerialIOTest.cpp Show resolved Hide resolved
src/Iteration.cpp Outdated Show resolved Hide resolved
include/openPMD/Iteration.hpp Outdated Show resolved Hide resolved
include/openPMD/Iteration.hpp Outdated Show resolved Hide resolved
Also fix a bug in group-based mode
@franzpoeschel franzpoeschel force-pushed the topic-close-iteration branch from ace60d0 to 8df6ebb Compare July 23, 2020 13:17
@franzpoeschel franzpoeschel force-pushed the topic-close-iteration branch from 7dc754f to b37a5c5 Compare July 23, 2020 18:53
@franzpoeschel franzpoeschel force-pushed the topic-close-iteration branch from b37a5c5 to 55c2041 Compare July 23, 2020 18:58
@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Jul 23, 2020

I think I added everything from your review now, and additionally Python bindings

Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks a lot! 🚀 ✨

@ax3l ax3l merged commit 0f38504 into openPMD:dev Jul 27, 2020
@ax3l ax3l changed the title Add functionality to close a file Iteration::close Sep 19, 2020
@franzpoeschel franzpoeschel deleted the topic-close-iteration branch January 28, 2021 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: new additions to the API frontend: C++17
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants