Allow SWMR mode and avoid flock problems with NXscanH5_FileRecorder #1457

Mar 24, 2021


Recently there were several issues somehow related with the H5 data storage e.g. #771, #1296, #1448, etc.. and for a developer it will be much more convenient to not have the problems with the file locking when inspecting the recorded data with tools like silx. This motivated me to dedicate some time to try to advance with #1124.

First, I wanted to dedicate the minimum possible time to achive an improvement and I discarded from the beginning more complex changes e.g. making recorders long lived objects in Sardana.

Second, the issue described in #1124 can be reproduce with automatic test added in 7747c94 or manually.

To reproduce manually:

  1. Execute a scan and store its data with the NXscanH5_FileRecorder
    newfile /tmp/test.h5
    ascan mot01 0 1 1 0.1
  2. Open the file with silx (I tested with version 0.12) e.g.
    silx view /tmp/test.h5
  3. Execute another scan:
    ascan mot01 0 1 1 0.1

Now, very briefly, what this PR introduces is a concept of write session for the NXscanH5_FileRecorder. This session can be manually started and ended using macros or programmatically (in a macro code) using a context manager.
The session opening and closing is based on the ScanFile and ScanDir environment variables so it respects different levels of environment (door and macro).

More details in the documentation: 8941294.

I have added an atexit hook to close the still opened files at the server shutdown in a466bde but most probably due to taurus-org/taurus#982 it is never called.

In case one forgot to end the session and already shutdown the server it is possible to clear the SWMR flag with h5clear -s <file_name> (not available in Debain stretch, but available in Debian buster and conda).

Some ideas on how to improve it:

  1. Add pre-newfile hook place to newfile in order to allow automatic closing and opening sessions when changing file
  2. Allow to control session from a distinct ScanFile and ScanDir level or directly using file path.
  3. Automatic offering of previous session ending on a new session start (@cpascual idea, thanks!)

I have a feeling that 2 and 3 are more time consuming, while 1 would be very easy to add. Anyway, I would like to hear your opinion about all this before investing more time to this topic.

If we agree to merge it, I would prefer to mark is as experimental feature (allowing API changes if strictly necessary).

Sardana H5 recorder opens and closes file at every scan.
This prevents other applications e.g. PyMCA, silx to open them
for reading in between the scans and keep them opened
while next scan is being launched. This is described in sardana-org#1124.

Add test to demonstrate this issue.
Sardana H5 recorder opens and closes file at every scan.
This prevents other applications e.g. PyMCA, silx to open them
for reading in between the scans and keep them opened
while next scan is being launched. This is described in sardana-org#1124.

- Add singleton handler of H5 files to keep them opened
- In NXscanH5_FileRecorder open the file only if it is not already opened
- Mode "populate instrument info" and "create NXData" to startRecordList instead of endRecordList (necessary by SWMR)
- Adapt test to avoid failure
There are use cases when we need to retrieve the environment variable
even if the macro had been stopped/aborterd e.g. h5_write_session
context manager.

Add _getEnv non-mAPI method to Macro class.
h5storage macros are used to control the h5 write sessions.
Importing the h5util module registers an atexit callback.
This is an unwanted secondary effect e.g. when building docs.

Register and unregister cleanup when opening the fist file and
closing the last file.
Fixed Merge Conflicts in:
d86b76a introduced merge conflicts. I resolved them in 70ff992

Strange... I cannot reproduce the problem

Maybe I am doing something different... or maybe it is that the problem is not triggered with a more recent stack.

Here is what I do:

Prepare environment

Install all dependencies from conda-forge
Then install sardana from develop git with pip -e

Executing transaction: done

╰─>$ conda activate sdn-1457

╰─>$ cd src/sardana                                                                                                                                                                                                                     (sdn-1457) 

╰─>$ pip install -e .                                                                                                                                                                                                                   (sdn-1457) 
Successfully installed sardana

Run Sardana:

On a dfferent console, run the sardana server (a pristine new instance)

╰─>$ conda activate sdn-1457
╰─>$ Sardana kk-1457                                                                                                                                                                                                                    (sdn-1457) 
kk-1457 does not exist. Do you wish to create a new one (Y/n) ? 
MainThread     ERROR    2021-03-22 18:47:42,805 ModuleManager: Invalid module HklPseudoMotorController
MainThread     WARNING  2021-03-22 18:47:46,185 Received elements error event Pool was shutdown or is inaccessible
Ready to accept request

test issue with spock (not reproduced)

Use a pristine profile, run sar_demo , run newfile, launch a scan , then silx view (keep open) and then another scan ==> all ok:

╰─>$ spock --profile=kk-1457                                                                                                                                                                                                            (sdn-1457) 
IPython profile: kk-1457

Connected to Door_kk-1457_1

Door_kk-1457_1 [1]: %sar_demo

Door_kk-1457_1 [2]: newfile /tmp/test-1457.h5
ScanDir is      : /tmp
ScanFile set to : test-1457.h5
Next scan is    : #1

Door_kk-1457_1 [3]: ascan mot18 0 1 1 0.1
Operation will be saved in /tmp/test-1457.h5 (HDF5::NXscan from NXscanH5_FileRecorder)
Scan #1 started at Mon Mar 22 18:50:37 2021. It will take at least 0:00:00.482843
 #Pt No    mot18      ct17      ct18      ct19      ct20       dt   
   0         0        0.1       0.2       0.3       0.4     0.0502625
   1         1        0.1       0.2       0.3       0.4     0.500065
Operation saved in /tmp/test-1457.h5 (HDF5::NXscan)
Scan #1 ended at Mon Mar 22 18:50:38 2021, taking 0:00:00.634184. Dead time 68.5% (motion dead time 50.5%)

Door_kk-1457_1 [4]: !silx view /tmp/test-1457.h5 &

Door_kk-1457_1 [5]: is not available is not available visualization is not available instantiation failed. View is ignored
Door_kk-1457_1 [5]: 

Door_kk-1457_1 [5]: ascan mot18 0 1 1 0.1
Operation will be saved in /tmp/test-1457.h5 (HDF5::NXscan from NXscanH5_FileRecorder)
Scan #2 started at Mon Mar 22 18:51:48 2021. It will take at least 0:00:00.765685
 #Pt No    mot18      ct17      ct18      ct19      ct20       dt   
   0         0        0.1       0.2       0.3       0.4     0.32062 
   1         1        0.1       0.2       0.3       0.4     0.743912
Operation saved in /tmp/test-1457.h5 (HDF5::NXscan)
Scan #2 ended at Mon Mar 22 18:51:49 2021, taking 0:00:00.873439. Dead time 77.1% (motion dead time 68.6%)

Door_kk-1457_1 [6]: 

test_swmr() is prone to a race condition because of reuse of
threading event. Fix it by using two events instead of reusing.

Also add a test of swmr without h5_write_session context (using
The reason for not reproducing the issue with a newer stack seems to be related with the reader (in this case silx).
In order to confirm this, I added (together with @reszelaz ) a test (test_swmr_without_h5_session) that does reproduce the issue unless the HDF5_USE_FILE_LOCKING environment variable is set to "FALSE" (the test is left in the testsuite with the variable set).

Collaborator Author

(the test is left in the testsuite with the variable set).

This variable won't be considered in the libhdf5 version from stretch. So, currently will fail in the CI.

test_swmr_without_h5_session requires support for
HDF5_USE_FILE_LOCKING environment variable. Mark the test as
expected failure if hdf5 does not support it.
This variable won't be considered in the libhdf5 version from stretch

Well seen @reszelaz !
I marked the test as xfail for hdf5<1.10.1

Carlos Pascual added 7 commits March 24, 2021 16:30
Provide a more user-friendly explanation when there is
a problem handling a hdf5 file file in swmr mode
Print file name(s) , SWMR mode and hdf5 compatibility info
Some docstrings referred to outdated version of the start
session macro. Fix them
Do not allow to start a new session if one is already open
This allows to close sessions even if the Scandata/ScanFiles variables changed
- Add macros to start/end h5 sessions based on file paths instead of
ScanDir and ScanPath
- Support a path argument in the h5_write_session context
- Do not block creation of new session even if some sessions is already
started (just print hint on how to close)
I checked the existing code and added the following UI-related improvements:

  • some more info printed by the h5 session macros
  • added h5_start_session_path and h5_end_session_path to support arbitrary file paths (not tied to ScanFile / ScanDir)

(most code done in pair-programming with @reszelaz )

I think it is ok to merge as is.

Copy link

After some changes, it LGTM

