Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain sample_stats naming convention #1063

Merged
merged 9 commits into from
Jan 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
for markdown/notebook parsing in docs ([1406](https://github.com/arviz-devs/arviz/pull/1406))
* Incorporated `input_core_dims` in `hdi` and `plot_hdi` docstrings ([1410](https://github.com/arviz-devs/arviz/pull/1410))
* Add documentation pages about experimental `SamplingWrapper`s usage ([1373](https://github.com/arviz-devs/arviz/pull/1373))
* Add `sample_stats` naming convention to the InferenceData schema ([1063](https://github.com/arviz-devs/arviz/pull/1063))
* Extend api documentation about `InferenceData` methods ([1338](https://github.com/arviz-devs/arviz/pull/1338))

### Experimental
Expand Down
37 changes: 25 additions & 12 deletions doc/source/schema/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,18 +40,31 @@ Moreover, each group contains the following attributes:
Samples from the posterior distribution p(theta|y).

### `sample_stats`
Information and diagnostics for each `posterior` sample, provided by the inference backend. It may vary depending on the algorithm used by the backend (i.e. an affine invariant sampler has no energy associated). The name convention used for `sample_stats` variables is the following:
* `lp`: (unnormalized) log probability for sample
* `step_size`
* `step_size_bar`
* `tune`: boolean variable indicating if the sampler is tuning or sampling
* `depth`:
* `tree_size`:
* `mean_tree_accept`:
* `diverging`: HMC-NUTS only, boolean variable indicating divergent transitions
* `energy`: HMC-NUTS only
* `energy_error`
* `max_energy_error`
Information and diagnostics for each `posterior` sample, provided by the inference
backend. It may vary depending on the algorithm used by the backend (i.e. an affine
invariant sampler has no energy associated). Therefore none of these parameters
should be assumed to be present in the `sample_stats` group. The convention
below serves to ensure that _if_ a variable is present with one of these names
it will correspond to the definition included here.

The name convention used for `sample_stats` variables is the following:

* `lp`: The joint log posterior density for the model (up to an additive constant).
* `acceptance_rate`: The average acceptance probabilities of all possible samples in the proposed tree.
* `step_size`: The current integration step size.
* `step_size_nom`: The nominal integration step size. The `step_size` may differ from this, for example if the step size is jittered. Should only be present if `step_size` is also present and it varies between samples (i.e. step size is jittered).
* `tree_depth`: The number of tree doublings in the balanced binary tree.
* `n_steps`: The number of leapfrog steps computed. It is related to `tree_depth` with `n_steps <=
2^tree_dept`.
* `diverging`: (boolean) Indicates the presence of leapfrog transitions with large energy deviation
from starting and subsequent termination of the trajectory. "large" is defined as `max_energy_error` going over a threshold.
* `energy`: The value of the Hamiltonian energy for the accepted proposal (up to an
additive constant).
* `energy_error`: The difference in the Hamiltonian energy between the initial point and
the accepted proposal.
* `max_energy_error`: The maximum absolute difference in Hamiltonian energy between the initial point and all possible samples in the proposed tree.
* `int_time`: The total integration time (static HMC sampler)


### `log_likelihood`
Pointwise log likelihood data. Samples should match with `posterior` ones and its variables
Expand Down