-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iteration::close #746
Iteration::close #746
Conversation
Thanks a lot for the PR! I think going for now with |
697b8a1
to
0b2a74c
Compare
Does ADIOS1 have any issues with having multiple instances (i.e. a writer and a reader) interfere with one another? See the comment in the latest commit. |
509af6e
to
29e5251
Compare
HDF5 seems to be working, ADIOS1 is still being a bit moody atm.. |
Could be a double free. |
test/SerialIOTest.cpp
Outdated
/* | ||
* This block will run fine if commented in | ||
* But the following block will fail with a segfault | ||
* upon adios_select_method ???? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's because you cannot initialize ADIOS1 twice in the same process.
ADIOS1 uses some nasty static states.
So: no two series that are active at the same time in a process and use each an ADIOS1 backend.
See: https://openpmd-api.readthedocs.io/en/latest/backends/adios1.html#limitations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That explains things, was already suspecting something in this direction. So, the basic "close file" functionality should be working anyway for ADIOS1, but we should probably just run a simplified test, such as checking whether the file is on disk?
Alternatively, running an MPI test with two processes could circumvent that, but I'm feeling that would be overkill...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, for ADIOS1 we can use our helper methods to check we found the files on disk from a single rank instead of reading them back early. Let's keep the logic for the other backends as is, it looks good to me.
Writing attributes to indicate files having been closed has been problematic for read mode. I used a boolean flag on |
bba51a7
to
a45a155
Compare
include/openPMD/Iteration.hpp
Outdated
* Indicates that an iteration has been logically closed. | ||
* Will be physically closed upon next flush. | ||
*/ | ||
std::shared_ptr< bool > m_closed = std::make_shared< bool >( false ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you quickly remind why this cannot be a member? Is some lifetime of the series/iteration preventing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class Iteration
is designed as a handle, all relevant information should be shared between copies.
auto iteration0 = series.iterations[0];
iteration0.close();
If the flag were not behind a shared_ptr
, we would then get:
iteration0.m_closed == true
// but
series.iterations[0].m_closed == false
(This touches a broader point that we should keep in mind, if we renovate the frontend some day: While most classes in the openPMD API are handles, this is implemented rather in an ad-hoc way currently. I would prefer a design similar to ADIOS2, i.e. have some none-copyable resource like internal::Iteration
and a user-facing handle external::Iteration
that acts like a shared_ptr
to internal::Iteration
)
include/openPMD/Iteration.hpp
Outdated
* to avoid reading undefined data if a backend implements optimizations | ||
* based on this information. | ||
*/ | ||
std::shared_ptr< bool > skipFlush = std::make_shared< bool >( false ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need both m_closed
and [m_]skipFlush
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation here is outdated, will fix – also maybe the naming should be redone, something like logicallyClosed
and physicallyClosed
. logicallyClosed
means that the iteration has been closed, but the corresponding flush
has not run yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I would just call these states m_closed
and m_flushed
or so. (we have no control when something really is done transferring, that is up to the backend we talk to.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think m_flushed
does not sound very clear about what it's doing?
I mean, while you can certainly argue that m_physicallyClosed
is not technically correct, it is "physically closed" from the frontend's perspective. Anything from now on is the backend's problem which the frontend needs not and cannot interfere with any longer. I'd say that's close enough? :D
Alternatively, what about m_closedInFrontend
and m_closedInBackend
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this thing into an enumeration that can take the values Open
, ClosedInFrontend
and ClosedInBackend
now.
@@ -564,6 +564,10 @@ Series::flushFileBased() | |||
bool allDirty = dirty; | |||
for( auto& i : iterations ) | |||
{ | |||
if ( *i.second.skipFlush ) | |||
{ | |||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we throw on this in case a users continues to issue API calls on an (accidentally) closed iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flushing procedures haven't been written with the possibility in mind that an iteration might not be present any longer. Flushing an iteration will e.g. always open the corresponding file, irregardless of whether it is still there or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the suggested renaming above will already help a bit. Anyway, this looks like something that will be very hard to debug.
Maybe I missed this: When would a closed and flushed iteration be flushed again? When we change global parameters of an overall series such as extension at a late point, maybe? Sounds legit to ignore that and we should maybe write some warnings to stderr when this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would a closed and flushed iteration be flushed again?
The idea behind continuing with the next iteration in that case is exactly to not flush it again. I've added sanity checks now to check whether the iteration hasnt been wrongly accessed and we can safely skip it.
c17266d
to
0c91c36
Compare
9bfad70
to
696803a
Compare
@franzpoeschel please rebase against latest |
Ah wait, I actually see a memory leak is caught. Is that from ADIOS2? In case it's not wrong usage on our side that causes this we can report it upstream and add a temporary ignore in |
I intentionally leak memory in the serial IO tests to avoid throwing in a destructor 1, 2. Looking at #709 again, I should check again my implementation to see whether there is actually a throwing destructor somewhere (the issue which I fixed in here only came up further down the line in the streaming branch, so we might be fine just letting the destructor run normally). Will look into it tomorrow. |
and other wrongly commited lines
This is currently not necessary, since closing an iteration in group-based mode is a no-op. It will become relevant in streaming-based mode, where closing an iteration means discarding its corresponding data packet.
1) Remove debugging output 2) Delete buffered attribute writes after performing them
This reverts commit 1490db6.
773578a
to
c320171
Compare
I don't know exactly where you saw the memory leak – is it gone now? I reverted the leak in the tests now. |
b27648c
to
9a702f2
Compare
include/openPMD/Iteration.hpp
Outdated
* Background: Upon calling Iteration::close(), the openPMD API | ||
* will add metadata to the iteration in form of an attribute, | ||
* indicating that the iteration has indeed been closed. | ||
* Useful mainly in streaming context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the description alone I do not understand the difference between closed()
and closedByWriter()
.
Do you need to add an additional sentence that states that the reader will probe this in a streaming context?
* Useful mainly in streaming context. | |
* Useful mainly in streaming context when a reader | |
* inquires from a writer that it is done writing. |
include/openPMD/Iteration.hpp
Outdated
* @return false Otherwise. | ||
*/ | ||
bool | ||
verifyClosed() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this to closeViolated()
or stillClosed()
or similar, verifyClosed()
is not clear. What does this return if an iteration was never closed? The name would also depend on this.
include/openPMD/Mesh.hpp
Outdated
@@ -182,6 +182,14 @@ class Mesh : public BaseRecord< MeshRecordComponent > | |||
|
|||
void flush_impl(std::string const&) override; | |||
void read() override; | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing a doxygen string without an element that captures it will propagate the block to the next location where it can "dock". This will lead to undesirable results.
/** | |
/* |
include/openPMD/Mesh.hpp
Outdated
* | ||
* @return true If closed iteration had no wrong accesses. | ||
* @return false Otherwise. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*/ | |
* @todo needs to be implemented | |
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it intentional that particles already have the method but meshes, don't? I think this is missing a declaration here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the method on the base class instead (i.e. BaseRecord
) since it also occurs somewhere in the hierarchy for particles. This is dead code, I removed it now.
src/IO/HDF5/HDF5IOHandler.cpp
Outdated
/* | ||
* std::unordered_map::erase: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* | |
* std::unordered_map::erase: | |
/* std::unordered_map::erase: |
auto fileID_it = m_fileIDs.find( writable ); | ||
if( fileID_it == m_fileIDs.end() ) | ||
{ | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that warn? Did you already see that happen? I fear such things will make other problems hard to debug, so just double-checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can let that throw instead. Closing a file should require that the file is present in the first place.?
src/Iteration.cpp
Outdated
@@ -19,6 +19,9 @@ | |||
* If not, see <http://www.gnu.org/licenses/>. | |||
*/ | |||
#include "openPMD/Iteration.hpp" | |||
|
|||
#include <tuple> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please include std libs last
src/Iteration.cpp
Outdated
*m_closed = CloseStatus::ClosedInFrontend; | ||
if( _flush ) | ||
{ | ||
Series * s = dynamic_cast< Series * >( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please validate the dynamic_cast
is not returning a nullptr
: #745
Also fix a bug in group-based mode
ace60d0
to
8df6ebb
Compare
7dc754f
to
b37a5c5
Compare
b37a5c5
to
55c2041
Compare
I think I added everything from your review now, and additionally Python bindings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks a lot! 🚀 ✨
While the openPMD API currently has a data layout (file-based layout) that creates new files for each iteration of data, no explicit method to close such a file currently exists. This PR adds a call
Iteration::close
.TODO:
Iteration::close
is deferred, like essentially all other calls in the API so far, the file will only be closed uponflush
ing. 1. I need to document that fact better, 2. should we keep it that way? Maybe something likeclose(bool flush = true)
could be helpful?