-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove corrupted lines in xvg files #126
Comments
@hannahbaumann how would you want this feature to work, if it were in alchemlyb? Can you outline Python code? |
I think for right now it would be enough if it checks whether the length of the last line is correct and that it removes the last line if it's too short. Similar to the |
Do you want the alchemlyb XVG parser to just ignore corrupt lines or do you want the function to be "somewhere" in alchemlyb so that you can import it to use it as part of your workflow? I am trying to gauge where this would fit in. |
The current philosophy of the library is to read data and make them available as dataframes. A function that writes out the data does not fit particularly well into this scheme, I feel. However, we could consider adding a slower XVG parser as an alternative to the fast pandas.read_csv() based one alchemlyb/src/alchemlyb/parsing/gmx.py Lines 300 to 302 in b068776
extract_* functions that enables reading of corrupt datafiles. This could then switch to the slow line-by-line parser that could be based on the code https://github.com/MobleyLab/alchemical-analysis/blob/master/alchemical_analysis/utils/corruptxvg.py, with the difference that it needs to produce a dataframe in the same way as the existing code, except that incomplete lines are omitted.
I'd be happy to review a PR based along the lines above. |
This is my understanding of this issue. There are two questions raised on this issue.
This is quite easy to solve. The pd.read_csv will give a line full of NaN when the line is not complete. My solution to the Gromacs praser is to add
to alchemlyb/src/alchemlyb/parsing/gmx.py Line 306 in b068776 The other problem The only question is how do we define the boundary of parser and preprocessing. Should the removal of corrupted lines and drop duplication been put in the parser or they should go to the preprocessing, such that parser retain as much original information as possible. |
I would consider it a preprocessing step, like cleaning data. |
Hi,
when I want to analyze free energy differences while simulations are still running, the last line of the xvg files is often corrupted (not fully written yet) and alchemlyb fails to do the analysis. Alchemical analysis has a feature that repairs those files, so I usually run that first and then run alchemlyb on the repaired xvg files. Is it possible to move that feature into alchemlyb?
https://github.com/MobleyLab/alchemical-analysis/blob/master/alchemical_analysis/utils/corruptxvg.py
The text was updated successfully, but these errors were encountered: