Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multiple parameters of interest #179

Open
lukasheinrich opened this issue May 23, 2018 · 28 comments
Open

Add support for multiple parameters of interest #179

lukasheinrich opened this issue May 23, 2018 · 28 comments
Labels
feat/enhancement New feature or request

Comments

@lukasheinrich
Copy link
Contributor

Description

The infrastructure should be generic enough to support limit setting on multiple POIs,

@matthewfeickert matthewfeickert added the feat/enhancement New feature or request label Jun 7, 2018
@swertz
Copy link

swertz commented Dec 17, 2019

What is the status of this? I understand multiple free parameters (normfactor) are possible, but I assume they will all be profiled? It's not entirely clear from the documentation...

@lukasheinrich
Copy link
Contributor Author

we don't have support for this now, but since we added capabilituy to set arbitrary parameters constant in #653 this should be simple to add, once #690 is in, we can expand this to multiple POIs. If you have a usecase, I'm happy to add this functionality

@swertz
Copy link

swertz commented Dec 17, 2019

That's great! I'm interested in doing likelihood unfolding, but there are many other use cases for multi-parameter fits (for instance, fitting EFT parameters). Here CLs limits are not really useful but one would be interested in general confidence intervals, Minos intervals (while either profiling other POIs or keeping them fixed), beside simply getting the best fit and covariance matrix.

Some remarks though:

  • For likelihood unfolding one should also be able to implement regularisation, i.e. an additional constraint term in the likelihood that depends on the POIs.
  • Stuff like EFT parameter fits would require going beyond simple linear scaling of the signal yields with the POIs.

AFAIK neither of those is supported at the moment?

@mswiatlo
Copy link

mswiatlo commented May 4, 2020

Hi,

I think we're also interested in this now, @lukasheinrich . For the ATLAS hh4b analysis we'd like to do simultaneous ggF and VBF fits, for example, and we're also interested in SMEFT interpretations where we could fit several parameters simultaneously. It'd be great to get support in PyHF for these kind of models!

Best,
Max

@matthewfeickert
Copy link
Member

matthewfeickert commented May 4, 2020

We can look into adding this into an upcoming release once v0.4.2 and v0.5.0 are shipped — which means soon!

@alexander-held
Copy link
Member

Hi, for basic MLE fits a "POI" is not different from any other normalization factor, so this should already work at the moment. For things like parameter rankings where the calculation is done wrt. a specific parameter (= the POI) it matters.

@kratsg
Copy link
Contributor

kratsg commented Jul 26, 2020

We will likely explore this in our run up to v0.6.0. #846 and #989 are working towards this -- since you want to fix multiple parameters as part of the fit. I will mention two things in brief:

  • as part of v0.5.0 that was released recently, we provide extensible support for custom (user-provided) optimizers to plug into pyhf that can be used in the meantime until we have support for multiple fixed parameters. Support will come through the ability to have fits with multiple fixed parameters. We already support an unconstrained fit across all parameters, some of which might be POIs.
  • HistFactory only allows for the declaration of a single POI in the specification. If you have a HistFactory XML+ROOT workspace that has multiple POIs declared, I would be interested in seeing it.

@kratsg
Copy link
Contributor

kratsg commented Sep 9, 2020

#1051 is in, so we can move forward.

@alexander-held
Copy link
Member

If you have a HistFactory XML+ROOT workspace that has multiple POIs declared, I would be interested in seeing it.

@kratsg I have one here: /afs/cern.ch/user/a/alheld/public/pyhf_multi_POI

<POI>Signal_norm Bg_norm</POI>

I am not sure whether this is using HistFactory as intended, but it gets correctly interpreted in at least two (ATLAS-internal) tools I tried it with.

@jagrundy
Copy link

Hi,

I was looking forward to this update being part of v0.6.0 but I can see it's now been pushed back to v.0.7.0. Do you have a time estimate for when this development will be complete? Thanks!

Kind regards,
James

@lukasheinrich
Copy link
Contributor Author

Now that we have toys in I think we can prfioritize this. We definitely want to deliver 0.7.0 more quickly than 0.6.0 and I hope within perhaps O(2 months) we can at least have some demo notebooks to to multi-POI fits (the code right now should already enable this a you can fits pdfs with arbitrary fixed parameters, but we're lacking convenience APIs.

@jagrundy have you tried doing this manually by any chance?

@jagrundy
Copy link

Hi Lukas,

Thank you for the quick reply and for the update. I haven't tried doing this manually but am aware that it's possible.

James

@kratsg
Copy link
Contributor

kratsg commented Feb 16, 2021

Yeah, the timeline seems about right. There are a few hurdles in order to get this happening:

  • JSON schema (currently POI is stored as a string): can either accept string+space-separated [no change] or list of strings [change to schema]
  • anything relying on model.config.poi_index: one assumption in a large part of the bookkeeping is that there's a single POI. Luckily, not too many core functions rely on this (as fitting can support multiple POIs)

I think while both of these hurdles are rather minor, it's mainly figuring out a consistent API that doesn't throw off a lot of users.

@matthewfeickert
Copy link
Member

matthewfeickert commented Feb 16, 2021

... it's mainly figuring out a consistent API that doesn't throw off a lot of users.

@jagrundy @swertz @mswiatlo what would be helpful is to try to understand what API users are really expecting. As it seems that you're waiting for this can you help by doing the following:

  1. Create example workspaces with the POIs you can share with us.
  2. Create and share a script with us that uses the existing API and the example workspaces to perform what you hoping that a future API for multiple POI would do.
  3. Give examples, or explain in detail, what you're expecting the API return structure to be.

This will help us to be able to get the thing that you're actually wanting out sooner.

@alexander-held
Copy link
Member

alexander-held commented Feb 16, 2021

Some thoughts from my perspective:

  • Breaking the schema to use a list of strings for the POI instead of a space-separated string of POI names seems like a good idea. The latter seems prone to parsing issues. How would you handle parameter names like "Signal normalization"? Currently those are possible I think.
  • pyhf.infer.hypotest would need a kwarg to select the POI (I would expect to pick it by name). I think it is reasonable to default to the first POI if only a single POI is present, and to default to the first POI if multiple are present with a warning/info. infer.fixed_poi_fit could probably behave exactly the same, and the test statistics probably too?
  • model.config.poi_index could probably work as it currently does for single POI models if you want to stay backwards compatible. Another function should return the index for a given POI name (with the name as optional kwarg), could again default to the first in the list and possibly warn/info if multiple POIs are in the workspace.
  • A convenience function to quickly get the list of POI names for a model would be nice, to do things like:
    for poi in model.config.poi_list:
        pyhf.infer.hypotest(..., poi=poi)
  • In addition, some convenience function to set all but one POI to constant might be nice to have. I imagine there are some use cases where multiple POIs are used for different models and the analyzer is interested in one model at a time. This could either be done by creating new models (which seems a bit convoluted maybe), or directly in the infer functions. I do not have an immediate idea for the API here, might be good to hear from some people who use this kind of setup regularly.

@Sally27
Copy link

Sally27 commented Mar 14, 2022

Hi pyhf team, do you by any chance have an update on this feature? Is it still on track for v 0.7.0?

@kratsg
Copy link
Contributor

kratsg commented Mar 15, 2022

Hi pyhf team, do you by any chance have an update on this feature? Is it still on track for v 0.7.0?

are you asking for support in the spec or in the code? pyhf can handle multiple POIs in the code normally... since multiple fixed parameters are allowed in the fit.

@kratsg
Copy link
Contributor

kratsg commented Mar 15, 2022

  • Breaking the schema to use a list of strings for the POI instead of a space-separated string of POI names seems like a good idea. The latter seems prone to parsing issues. How would you handle parameter names like "Signal normalization"? Currently those are possible I think.

ROOT doesn't quite allow this in the HiFa XML spec. Not sure what we do here. Do we match the weird behavior in ROOT or try to do better? Then we get into situations where workspaces in HiFa JSON are not translateable into ROOT XML if we are able to handle spaces via list of strings...

  • pyhf.infer.hypotest would need a kwarg to select the POI (I would expect to pick it by name). I think it is reasonable to default to the first POI if only a single POI is present, and to default to the first POI if multiple are present with a warning/info. infer.fixed_poi_fit could probably behave exactly the same, and the test statistics probably too?

Yes. Like how we handle multiple measurements.

  • A convenience function to quickly get the list of POI names for a model would be nice, to do things like:
    for poi in model.config.poi_list:
        pyhf.infer.hypotest(..., poi=poi)

Yes.

  • In addition, some convenience function to set all but one POI to constant might be nice to have. I imagine there are some use cases where multiple POIs are used for different models and the analyzer is interested in one model at a time. This could either be done by creating new models (which seems a bit convoluted maybe), or directly in the infer functions. I do not have an immediate idea for the API here, might be good to hear from some people who use this kind of setup regularly.

This would be where we'd ask for the first/second/third/nth POI fit. mle.infer.fixed_nth_poi_fit(...) which is a wrapper.

@Sally27
Copy link

Sally27 commented Mar 21, 2022

Hi pyhf team, do you by any chance have an update on this feature? Is it still on track for v 0.7.0?

are you asking for support in the spec or in the code? pyhf can handle multiple POIs in the code normally... since multiple fixed parameters are allowed in the fit.

Excellent, I am very glad to hear that we can do the multiple POI in the code already, I will try this in my case.

@kratsg
Copy link
Contributor

kratsg commented Mar 21, 2022

You can look here (fixed_poi_fit is only a few-lines wrapper around pyhf.infer.mle.fit which handles all situations):

pyhf/src/pyhf/infer/mle.py

Lines 196 to 202 in 1884c6c

init_pars = [*(init_pars or pdf.config.suggested_init())]
fixed_params = [*(fixed_params or pdf.config.suggested_fixed())]
init_pars[pdf.config.poi_index] = poi_val
fixed_params[pdf.config.poi_index] = True
return fit(data, pdf, init_pars, par_bounds, fixed_params, **kwargs)

setting fixed_params[index] = True and init_pars[index] = fixed_value is all you need, for each POI index. We just haven't gotten to being able to serialize this information consistently.

@alexander-held
Copy link
Member

Do we match the weird behavior in ROOT or try to do better? Then we get into situations where workspaces in HiFa JSON are not translateable into ROOT XML if we are able to handle spaces via list of strings...

A list of strings is arguably the correct solution for this, and I think it's absolutely worth breaking this one particular roundtrip use case in favor of streamlining the experience for everything else. As far as I can tell, the issue only arises in this quite specific setup:

  • workspace built / edited in JSON format (otherwise parameters would not have spaces anyway),
  • translated to xml+ROOT (this can emit a suitable warning, and auto-convert spaces to underlines or similar),
  • translated back to JSON.

The extent to which things "break" is that parameter names change automatically (or have to be changed manually), and that is only relevant if they contain a space, which presumably is very rare to begin with. Given that this can be caught during conversion and the user can be informed, I think this is quite minor.

@alexander-held
Copy link
Member

alexander-held commented Mar 21, 2022

I tried out a roundtrip from JSON with a POI called Signal norm. Conversion to xml is fine, and xml back to JSON recovers the original spec. A problem only arises with hist2workspace, which looks for two POIs in that setup and cannot find either of them:

WARNING: Can't find parameter of interest: Signal in Workspace. Not setting in ModelConfig.
WARNING: Can't find parameter of interest: norm in Workspace. Not setting in ModelConfig.

Along similar lines, I tried out adding an emoji to a normfactor name: no obvious errors, but in the json2xml -> xml2json roundtrip there's some (presumably encoding-related) slight change: 😀 becomes \ud83d\ude00. (I'm not trying to argue that support for this kind of stuff is needed, but perhaps it may be useful to add a projection to json2xml when encountering Unicode characters, within pyhf itself I see no issues.)

When adding the same emoji to a nuisance parameter name (which has an associated histogram), json2xml works fine, but neither xml2json nor hist2workspace work. That might be a lack of support in ROOT though, since the errors seem to be related to finding the histogram in the .root file.

(edit: the last point was unrelated to pyhf, see scikit-hep/uproot5#576)

@amartyarej
Copy link

Hi,

Can you please confirm if in the current version, multiple POI workspace can be converted to pyhf in JSON format as desired by default?
Given that currently many analyses are using profile likelihood unfolding and some are published, it will be good to have this feature for likelihood publication.

Regards,
Amartya

@kratsg
Copy link
Contributor

kratsg commented Jul 14, 2022

pyhf xml2json functionality works normally in the JSON format. It's just not something that's implemented in terms of the functionality. The serialization should be ok.

@amartyarej
Copy link

I see. Thanks!

@alexander-held
Copy link
Member

The "as desired" part is arguable, I would personally find a format where the POI entry is a list of strings better in the longer term. This is a breaking change however. Right now you will get a single string with spaces between POIs like

"poi": "poi_1 poi_2 poi_3"

which pyhf (as of the current implementation) will fail to convert into a model.

You could consider only specifying a single POI in the model, as this is technically metadata without immediate impact on inference.

If there will be a move to a list format ["poi_1", "poi_2", "poi_3"] at some point in the future, that would either require a workspace update or be handled in the pyhf implementation of reading the workspace information (which is tricker since currently whitespaces are supported in POI names, causing possible ambiguities).

@kratsg
Copy link
Contributor

kratsg commented Jul 14, 2022

@amartyarej how quick of a timescale are you asking for? We have a ton of changes in the pipeline.

@amartyarej
Copy link

We are planning to finalize our studies by end of August

@kratsg kratsg moved this to To do in pyhf v0.8.0 Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/enhancement New feature or request
Projects
Status: To do
Development

No branches or pull requests

9 participants