-
-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explain sample_stats naming convention #1063
Conversation
@OriolAbril I have updated the descriptions and added a few missing variables. Are there any more variables to be added? |
I was thinking about it and I think we should either rename it (preferably) or not have it in sample stats. My main concerns are: as I understand it, one of the goals of ArviZ is also to ease the comparison between different inference libraries, we do not use it for now, but our users may use it and accessing a value in an inference data should not depend on the inference library; the second concern is that we may have one use for it at some point (I can't think of nothing else than including the acceptance prob warning in our summary right now but who knows) the naming differences may block or delay the new feature. This second concern is why I prefer renaming over removing it. |
I see. I will rename it once you confirm @OriolAbril @ahartikainen |
I also support renaming. |
For Stan the outputs for sample_stats are
1. The lp__ value also represents the potential energy in the Hamiltonian system and is rate bounded by |
Then there is also other settings see for example for CmdStan output
These are also probably needed, and also for other libs too. |
These would probably go under attributes though, right? |
Yes, I would put them under attributes. Maybe under |
That sounds good. Perhaps that should be a separate issue. For Turing.jl, we have the following sample stats for HMC:
For SMC:
Others that are supposedly parameters but I've never seen used:
|
I'll try to summarize the information gathered grouping equivalent sampler stats from different libraries. Descriptions and sources should still be checked in the original comment. Feel free to edit (I think members should have permission). HMC
@fehiepsi could you please check Pyro and NumPyro names? I think I am on the right track but not completely sure. I don't know enough about tfp as to include anything here. Is there anybody we could tag that comes to mind? SMCany thoughts @aloctavodia ?
MH?Notes: I think the only sampler stats currently used (in plotting or in stats) are PyMC3 reference: src1, (MH related, not really sure they are relevant: src2, src3, src4) |
Thank @OriolAbril, they are correct names. |
What tfp use? |
TFP does not have internal naming convention, as they are function output (tensor or array) and user are free to name it whatever they want - I was manually mapping it eg: https://colab.research.google.com/github/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Modeling_with_JointDistribution.ipynb#scrollTo=4qQdOPk90f7t |
Stan hmc has these (I need to verify)
|
Currently PyMC3's SMC does not return any statistics, but I should fix that. |
I updated my previous comment with the table summary to try to restart the discussion. I tried to use the names that felt had more consensus (e.g. more than one library had similar or equal names), please comment, I actually don't know the reasons behind any of the naming conventions chosen by each library. The one I am having trouble with Regarding |
Btw, the renaming work is to be done in this PR itself right? Or leave this one only for the descriptions of the existing |
000776a
to
af4834f
Compare
Checking in on old PRs. Any possibility of bringing this over the line or should we close? |
This one depends on core contributors having some consensus on the names to use. Maybe a lab meeting could help in finishing this? |
Sounds good. Ill add it as topic for next lab meeting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly clarifying questions. Also your above table is super useful, and I got a question just this week about the meaning if these parameters, and it would be great if this table was somewhere. And perhaps ArviZ's docs is the best place for it to be. Would you like to include it?
We have agreed on the names, see table in previous comment, but we still need to update the PR to match said table and add the definitions. There are some definitions in previous comments when describing Stan and Turing sample stats. I think now would be a good time to implement the convention so sample stats can be used in https://github.com/arviz-devs/arviz_dashboard independently of the sampling backend |
Explicit name is better, then let's add description somewhere what is what. E.g. attrs field could contain something? |
Okay, so will you be working on this PR now? I don't have much knowledge about the definitions but I can update if you want |
7690699
to
c7ff582
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to mention sample_stats_prior
which is analogous to sample_stats
but the 1to1 correspondence is with the prior
and not with the posterior
Oh, did we miss the
Also |
Looks like it. |
added definition proposals.
Co-authored-by: Ari Hartikainen <ahartikainen@users.noreply.github.com>
dd3a59b
to
97e6ba5
Compare
Maybe I missed something, but I saw
|
These were added after this comment was written, let's continue this in pymc-devs/pymc-examples#95 better |
Description
fixes #1053
Checklist