A unified way of unit conversion #125

xiki-tempula · 2021-04-28T19:42:07Z

Currently, the internal unit is always kt. However, when presenting the data, it might be desirable to present the data in kcal/mol or kj/mol. Thus, it would be a good idea to have a discussion on how to do the unit conversion.

Currently, the only place that requires unit conversion is the plotting functions and later on the workflow #114.

My proposal is that the plotting function will be taking a conversion factor that converts kt to the desired unit.

orbeckst · 2021-04-28T20:24:06Z

One thing I would like to keep constant is that the internal units are always kT — that's one of the fundamental assumptions. The parsers make sure that the data are in this common format. This means that unit conversion should only happen from kT into something else for downstream processing.

The plotting functions take DataFrames as input so a natural way inside alchemlyb would be to have something like a postprocessor function (say postprocessors.units.to_kcalmol()). I'd say a postprocessor is similar to a preprocessor but creates data transformations for downstream analysis or plotting. @xiki-tempula brought up using DataFrame attributes (DataFrame.attrs – flagged as experimental in their API) to store metadata. We could store the unit as metadata, to check that we don't make a mistake on conversion. We could also store the temperature there, if known, which would make conversion from kT to absolute units less error prone. (However, using attr would require us to carefully check if and how to propagate attr through various functions and estimators. It might be a bigger project if we wanted to keep tracking meta data.)

We could also look into any number of python unit packages although I'd favor a light weight solution, especially as we are really only concerned with energy conversion from a fixed unit system. But if anyone has any ideas please make suggestions.

Our plotting function could call the appropriate unit conversion depending on the value of units — at the moment, that just sets the axes labels but we could use it in a more substantial role.

@dotsdl @davidlmobley , any suggestions?

What do you think @xiki-tempula ?

davidlmobley · 2021-04-28T20:50:23Z

I like the idea of carrying around units to avoid problems, so this sounds good.

xiki-tempula · 2021-04-29T13:45:40Z

So I think the current general workflow would be

Input file (Energy: kT; kcal/mol; kJ/mol, Temperature K)
↓
Preprocessing (Energy: kT)
↓
Estimator (Energy: kT)
↓
Graphic/text output (Energy: kT; kcal/mol; kJ/mol) require the user to rescale the unit.

So the new plan would be

Input file (Energy: kT; kcal/mol; kJ/mol, Temperature K)
↓
Preprocessing (Energy: kT)
↓
Estimator (Energy: kT)
↓
Unit conversion (require Temperature K)
↓
Graphic/text output (Energy: kT; kcal/mol; kJ/mol).

So the Unit conversion could happen through postprocessors.units.to_kcalmol().
The question would then be how to pass the temperature to this function.

The temperature could be added to DataFrame.attrs, which could have problem of propagation from dhdl data frame to d_delta_f_ data frame.

I think one way would be adding temperature as an input postprocessors.units.to_kcalmol(temperature).
Another way would be setting the temperature global variable alchemlyb.envs.temperature.

I'm not quite sure as to how to pass units around.

orbeckst · 2021-04-30T14:56:08Z

I had in mind what you sketched out in your new plan.

Originally I was thinking that the Graphic/text output step can do a unit conversion internally (using the same tools that one can use explicitly). However, if we find a way to carry units and temperature around with the df then this can be done automatically.

I do not like to keep global state with global variables. All of alchemlyb is written in a way to avoid keeping global state as much as possible.

xiki-tempula · 2021-05-04T10:52:25Z

So I would imagine using attrs to pass the temperature around (if the temperature is not set as a global variable).

Input file (Energy: kT; kcal/mol; kJ/mol, Temperature K): extract_u_nk(xvg, T=300) > u_nk.attrs['temp'] = 300
↓
Preprocessing (Energy: kT): statistical_inefficiency(u_nk) > u_nk_sample.attrs['temp'] = 300
↓
Estimator (Energy: kT) > MBAR().fit(u_nk_sample) > mbar.delta_f_.attrs['temp'] = 300; mbar. d_delta_f_.attrs['temp'] = 300 (one might need to make sure that the attribute temp is being passed through)
↓
Graphic/text output (Energy: kT; kcal/mol; kJ/mol). which uses the internal postprocessors.units.to_kcalmol(temperature) to do the unit conversion.

orbeckst · 2021-05-04T19:13:07Z

Makes sense to me. In addition to df.attrs['temperature'] (let's write out the name instead of "temp" which could also mena "temporary") I'd then also include df.attrs['unit'] = "kT" for completeness.

The main challenge is going to be what happens when someone manipulates the df externally and the attrs get lost. They do not automatically get copied on slicing a df. I have two ideas:

The simplest solution is to have all alchemlyb functions that input df check the df and raise an error if the attrs are missing. (We might be able to write a decorator for this check.) Then we just need to document which attrs are required.
Alternatively, we can subclass pandas.DataFrame as our own df and then add code to copy attrs.

However, I'd like to keep things simple so I am in favor of 1.

orbeckst · 2021-05-04T19:16:18Z

Btw, maybe attrs['unit'] is not very clear, given that dataframes can have different columns. Maybe attrs['energy_unit'] and attrs['time_unit'] (if necessary) would be clearer?

xiki-tempula · 2021-05-04T21:33:41Z

In the Input file layer, the output data frame would have the attributes.

I'm thinking of having a decorator pass the attributes from the input data frame to the output data frame during the preprocessing and estimator layer.
Something similar to

def pass_attrs(func):
    def wrapper(input_dataframe, *args,**kwargs):
        dataframe = func(input_dataframe, *args,**kwargs)
        dataframe.attrs = input_dataframe.attrs
        return dataframe
    return wrapper

The Graphic/text output layer would then just use these attributes to do the work.

I suppose the checking procedure could be implemented to the Graphic/text output layer while preprocessing and estimator layer would not check them as it is not being used. I think it might be better for the functions in the Graphic/text output layer to check the attributes in a case to case manner as the input are quite heterogenous.

orbeckst · 2021-05-11T00:25:02Z

I think your plan is good.

xiki-tempula · 2021-05-15T14:55:25Z

I tried to implement this plan in #129。

It seems that attr is only supported from pandas 1.0 which doesn't support py2 and py35.
The attr only works as intended from 1.2, which doesn't support py36.

orbeckst · 2021-05-15T15:19:27Z

Can you please raise an issue for dropping py 2.7 and 3.5 support?

…

Am 5/15/21 um 07:55 schrieb Zhiyi Wu ***@***.***>: I tried to implement this plan in #129 but it seems that attr is only supported from pandas 1.0 which doesn't support py2 and py35. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

- state explicitly supported Python on OS - move Changes to top position and reordered changes by importance - added explicit item for unit-awareness (#125) - add date

orbeckst added enhancement estimators parsers visualisation labels May 4, 2021

orbeckst mentioned this issue May 4, 2021

Added functionality and some fixes alchemistry/flamel#1

Merged

xiki-tempula mentioned this issue May 15, 2021

Add attr to the dataframe #129

Merged

xiki-tempula mentioned this issue May 15, 2021

Drop support for py2, py35 and py36 #130

Closed

orbeckst mentioned this issue Jun 10, 2021

adopt NEP29 for support of Python versions #140

Closed

orbeckst added this to the 0.5.0 milestone Jun 10, 2021

xiki-tempula closed this as completed in #129 Jun 28, 2021

orbeckst mentioned this issue Jul 4, 2021

Update estimators-ti.rst #146

Closed

orbeckst added a commit that referenced this issue Aug 1, 2021

update CHANGES for 0.5.0

ff7ed5d

- state explicitly supported Python on OS - move Changes to top position and reordered changes by importance - added explicit item for unit-awareness (#125) - add date

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A unified way of unit conversion #125

A unified way of unit conversion #125

xiki-tempula commented Apr 28, 2021

orbeckst commented Apr 28, 2021

davidlmobley commented Apr 28, 2021

xiki-tempula commented Apr 29, 2021

orbeckst commented Apr 30, 2021

xiki-tempula commented May 4, 2021

orbeckst commented May 4, 2021

orbeckst commented May 4, 2021

xiki-tempula commented May 4, 2021 •

edited

Loading

orbeckst commented May 11, 2021

xiki-tempula commented May 15, 2021 •

edited

Loading

orbeckst commented May 15, 2021 via email

A unified way of unit conversion #125

A unified way of unit conversion #125

Comments

xiki-tempula commented Apr 28, 2021

orbeckst commented Apr 28, 2021

davidlmobley commented Apr 28, 2021

xiki-tempula commented Apr 29, 2021

orbeckst commented Apr 30, 2021

xiki-tempula commented May 4, 2021

orbeckst commented May 4, 2021

orbeckst commented May 4, 2021

xiki-tempula commented May 4, 2021 • edited Loading

orbeckst commented May 11, 2021

xiki-tempula commented May 15, 2021 • edited Loading

orbeckst commented May 15, 2021 via email

xiki-tempula commented May 4, 2021 •

edited

Loading

xiki-tempula commented May 15, 2021 •

edited

Loading