Replies: 11 comments
-
If there's a common function that we can use for "everything" (and just plugin the estimator) then alchemlyb would be a good place, I think – along the lines that we only have to write and test the code once. I generally like building blocks that I can freely combine. Something like def bootstrapped(data, estimator):
...
return mean, error Alternatively, we could hide the machinery in the One advantage of doing it at the alchemlyb level is that it might not be difficult to run alchemlyb with dask (essentially, use the dask.DataFrame) and then the bootstrapping can be parallelized without effort. A while ago @dotsdl played around with alchemlyb and dask – I can't quite remember how much would need to be changed. |
Beta Was this translation helpful? Give feedback.
-
That might be a good idea. It could return a dictionary of all the bootstrapped results, along with the uncertainty estimate. I'll think about how to organize this. One issue is that the data will look different with each estimator, thus requiring fairly different conditionals inside the bootstrapped data. Also, if one was analyzing K states, calculating the free energy with BAR executed pairwise, one would want to bootstrap over the entire data set of K states; i.e. you would need to bootstrap the entire procedure, not over a single estimator. |
Beta Was this translation helpful? Give feedback.
-
@mrshirts do you have a paper or writeup you can point to for this approach? I'd be happy to prototype something. We may be able to steal design inspiration from |
Beta Was this translation helpful? Give feedback.
-
So, I don't really have a good simple paper. http://www.alchemistry.org/wiki/Analyzing_Simulation_Results#Bootstrap_Sampling is a good summary. I agree that something like After the bootstrap sampling with replacement, everything else is pretty trivial. You calculate your function on each of the bootstrapped. You then have a set of resullts (could be multivalue return, and you can simply return a list of all the answers. You can optionally return various statistical measures of this list for each of the results - mean, standard deviation, confidence intervals. One could make decorrelation of the data sets part of the algorithm, but it would perhaps be more modular to do the decorrelation as a separate step. |
Beta Was this translation helpful? Give feedback.
-
Dear alchemlyb team! |
Beta Was this translation helpful? Give feedback.
-
Hey all, after discussions with @wildromi, I've committed to working on this issue over the next two weeks. I expect the first iteration to be usable but probably not the approach we end up with. I'll post a WIP PR as soon as I can. |
Beta Was this translation helpful? Give feedback.
-
Hi, David- I'd love to talk some more about this, as I've been dealing with similar setups for a while. Shoot me an email at the CU email and we can strategize some more? A key is bootstrapping simultaneously over multiple time series, for example. |
Beta Was this translation helpful? Give feedback.
-
@mrshirts sent! I'm looking forward to leveraging your experience to jumpstart the approach. |
Beta Was this translation helpful? Give feedback.
-
@dotsdl: take a look at https://github.com/choderalab/pymbar/blob/pmf/pymbar/pmf.py and look at lines 590 to 615 to get a sense at how bootstrapping works in a complicated case (in this case, calculating a potential of mean force) |
Beta Was this translation helpful? Give feedback.
-
I met with @mrshirts yesterday, and we aligned on an approach. I have started a WIP PR on #94. There is a list of things to do yet, but we have the start of our implementation. You can check out how things work so far in this gist. Comments welcome! Please don't use this in production work yet until we have tests ensuring that |
Beta Was this translation helpful? Give feedback.
-
The gist for #94 has been updated; it requires components of #98, which can be played with on this branch. |
Beta Was this translation helpful? Give feedback.
-
Should bootstrapping be implemented at the alchemlyb level, or pymbar level? For MBAR, it would be better at the pymbar level, since it can be easily encapsulated (user doesn't have to worry about it), and one can request either uncertainty estimate.
For BAR over several states, then the bootstrapping needs to be done at the level ABOVE the BAR call, since we need to bootstrap all of the data simultaneously before feeding it into BAR. Same for EXP applied to a string of states.
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions