Add support for multiple parameters of interest #179

lukasheinrich · 2018-05-23T15:51:59Z

Description

The infrastructure should be generic enough to support limit setting on multiple POIs,

swertz · 2019-12-17T16:50:05Z

What is the status of this? I understand multiple free parameters (normfactor) are possible, but I assume they will all be profiled? It's not entirely clear from the documentation...

lukasheinrich · 2019-12-17T16:53:28Z

we don't have support for this now, but since we added capabilituy to set arbitrary parameters constant in #653 this should be simple to add, once #690 is in, we can expand this to multiple POIs. If you have a usecase, I'm happy to add this functionality

swertz · 2019-12-17T18:20:14Z

That's great! I'm interested in doing likelihood unfolding, but there are many other use cases for multi-parameter fits (for instance, fitting EFT parameters). Here CLs limits are not really useful but one would be interested in general confidence intervals, Minos intervals (while either profiling other POIs or keeping them fixed), beside simply getting the best fit and covariance matrix.

Some remarks though:

For likelihood unfolding one should also be able to implement regularisation, i.e. an additional constraint term in the likelihood that depends on the POIs.
Stuff like EFT parameter fits would require going beyond simple linear scaling of the signal yields with the POIs.

AFAIK neither of those is supported at the moment?

mswiatlo · 2020-05-04T13:54:56Z

Hi,

I think we're also interested in this now, @lukasheinrich . For the ATLAS hh4b analysis we'd like to do simultaneous ggF and VBF fits, for example, and we're also interested in SMEFT interpretations where we could fit several parameters simultaneously. It'd be great to get support in PyHF for these kind of models!

Best,
Max

matthewfeickert · 2020-05-04T17:24:48Z

We can look into adding this into an upcoming release once v0.4.2 and v0.5.0 are shipped — which means soon!

alexander-held · 2020-07-08T10:55:32Z

Hi, for basic MLE fits a "POI" is not different from any other normalization factor, so this should already work at the moment. For things like parameter rankings where the calculation is done wrt. a specific parameter (= the POI) it matters.

kratsg · 2020-07-26T04:39:55Z

We will likely explore this in our run up to v0.6.0. #846 and #989 are working towards this -- since you want to fix multiple parameters as part of the fit. I will mention two things in brief:

as part of v0.5.0 that was released recently, we provide extensible support for custom (user-provided) optimizers to plug into pyhf that can be used in the meantime until we have support for multiple fixed parameters. Support will come through the ability to have fits with multiple fixed parameters. We already support an unconstrained fit across all parameters, some of which might be POIs.
HistFactory only allows for the declaration of a single POI in the specification. If you have a HistFactory XML+ROOT workspace that has multiple POIs declared, I would be interested in seeing it.

kratsg · 2020-09-09T20:51:23Z

#1051 is in, so we can move forward.

alexander-held · 2020-10-21T14:00:23Z

If you have a HistFactory XML+ROOT workspace that has multiple POIs declared, I would be interested in seeing it.

@kratsg I have one here: /afs/cern.ch/user/a/alheld/public/pyhf_multi_POI

<POI>Signal_norm Bg_norm</POI>

I am not sure whether this is using HistFactory as intended, but it gets correctly interpreted in at least two (ATLAS-internal) tools I tried it with.

jagrundy · 2021-02-16T09:48:38Z

Hi,

I was looking forward to this update being part of v0.6.0 but I can see it's now been pushed back to v.0.7.0. Do you have a time estimate for when this development will be complete? Thanks!

Kind regards,
James

lukasheinrich · 2021-02-16T10:05:12Z

Now that we have toys in I think we can prfioritize this. We definitely want to deliver 0.7.0 more quickly than 0.6.0 and I hope within perhaps O(2 months) we can at least have some demo notebooks to to multi-POI fits (the code right now should already enable this a you can fits pdfs with arbitrary fixed parameters, but we're lacking convenience APIs.

@jagrundy have you tried doing this manually by any chance?

jagrundy · 2021-02-16T10:09:55Z

Hi Lukas,

Thank you for the quick reply and for the update. I haven't tried doing this manually but am aware that it's possible.

James

kratsg · 2021-02-16T12:42:01Z

Yeah, the timeline seems about right. There are a few hurdles in order to get this happening:

JSON schema (currently POI is stored as a string): can either accept string+space-separated [no change] or list of strings [change to schema]
anything relying on model.config.poi_index: one assumption in a large part of the bookkeeping is that there's a single POI. Luckily, not too many core functions rely on this (as fitting can support multiple POIs)

I think while both of these hurdles are rather minor, it's mainly figuring out a consistent API that doesn't throw off a lot of users.

matthewfeickert · 2021-02-16T15:26:59Z

... it's mainly figuring out a consistent API that doesn't throw off a lot of users.

@jagrundy @swertz @mswiatlo what would be helpful is to try to understand what API users are really expecting. As it seems that you're waiting for this can you help by doing the following:

Create example workspaces with the POIs you can share with us.
Create and share a script with us that uses the existing API and the example workspaces to perform what you hoping that a future API for multiple POI would do.
Give examples, or explain in detail, what you're expecting the API return structure to be.

This will help us to be able to get the thing that you're actually wanting out sooner.

alexander-held · 2021-02-16T15:44:25Z

Some thoughts from my perspective:

Breaking the schema to use a list of strings for the POI instead of a space-separated string of POI names seems like a good idea. The latter seems prone to parsing issues. How would you handle parameter names like "Signal normalization"? Currently those are possible I think.
pyhf.infer.hypotest would need a kwarg to select the POI (I would expect to pick it by name). I think it is reasonable to default to the first POI if only a single POI is present, and to default to the first POI if multiple are present with a warning/info. infer.fixed_poi_fit could probably behave exactly the same, and the test statistics probably too?
model.config.poi_index could probably work as it currently does for single POI models if you want to stay backwards compatible. Another function should return the index for a given POI name (with the name as optional kwarg), could again default to the first in the list and possibly warn/info if multiple POIs are in the workspace.
A convenience function to quickly get the list of POI names for a model would be nice, to do things like:
```
for poi in model.config.poi_list:
    pyhf.infer.hypotest(..., poi=poi)
```
In addition, some convenience function to set all but one POI to constant might be nice to have. I imagine there are some use cases where multiple POIs are used for different models and the analyzer is interested in one model at a time. This could either be done by creating new models (which seems a bit convoluted maybe), or directly in the infer functions. I do not have an immediate idea for the API here, might be good to hear from some people who use this kind of setup regularly.

Sally27 · 2022-03-14T11:15:15Z

Hi pyhf team, do you by any chance have an update on this feature? Is it still on track for v 0.7.0?

kratsg · 2022-03-15T02:17:52Z

Hi pyhf team, do you by any chance have an update on this feature? Is it still on track for v 0.7.0?

are you asking for support in the spec or in the code? pyhf can handle multiple POIs in the code normally... since multiple fixed parameters are allowed in the fit.

kratsg · 2022-03-15T02:21:41Z

Breaking the schema to use a list of strings for the POI instead of a space-separated string of POI names seems like a good idea. The latter seems prone to parsing issues. How would you handle parameter names like "Signal normalization"? Currently those are possible I think.

ROOT doesn't quite allow this in the HiFa XML spec. Not sure what we do here. Do we match the weird behavior in ROOT or try to do better? Then we get into situations where workspaces in HiFa JSON are not translateable into ROOT XML if we are able to handle spaces via list of strings...

pyhf.infer.hypotest would need a kwarg to select the POI (I would expect to pick it by name). I think it is reasonable to default to the first POI if only a single POI is present, and to default to the first POI if multiple are present with a warning/info. infer.fixed_poi_fit could probably behave exactly the same, and the test statistics probably too?

Yes. Like how we handle multiple measurements.

A convenience function to quickly get the list of POI names for a model would be nice, to do things like:
for poi in model.config.poi_list:
    pyhf.infer.hypotest(..., poi=poi)

Yes.

In addition, some convenience function to set all but one POI to constant might be nice to have. I imagine there are some use cases where multiple POIs are used for different models and the analyzer is interested in one model at a time. This could either be done by creating new models (which seems a bit convoluted maybe), or directly in the infer functions. I do not have an immediate idea for the API here, might be good to hear from some people who use this kind of setup regularly.

This would be where we'd ask for the first/second/third/nth POI fit. mle.infer.fixed_nth_poi_fit(...) which is a wrapper.

Sally27 · 2022-03-21T09:07:05Z

Hi pyhf team, do you by any chance have an update on this feature? Is it still on track for v 0.7.0?

are you asking for support in the spec or in the code? pyhf can handle multiple POIs in the code normally... since multiple fixed parameters are allowed in the fit.

Excellent, I am very glad to hear that we can do the multiple POI in the code already, I will try this in my case.

kratsg · 2022-03-21T10:21:11Z

You can look here (fixed_poi_fit is only a few-lines wrapper around pyhf.infer.mle.fit which handles all situations):

pyhf/src/pyhf/infer/mle.py

Lines 196 to 202 in 1884c6c

    
           init_pars = [*(init_pars or pdf.config.suggested_init())] 
        
           fixed_params = [*(fixed_params or pdf.config.suggested_fixed())] 
        
           init_pars[pdf.config.poi_index] = poi_val 
        
           fixed_params[pdf.config.poi_index] = True 
        
           return fit(data, pdf, init_pars, par_bounds, fixed_params, **kwargs)

setting fixed_params[index] = True and init_pars[index] = fixed_value is all you need, for each POI index. We just haven't gotten to being able to serialize this information consistently.

alexander-held · 2022-03-21T10:23:10Z

Do we match the weird behavior in ROOT or try to do better? Then we get into situations where workspaces in HiFa JSON are not translateable into ROOT XML if we are able to handle spaces via list of strings...

A list of strings is arguably the correct solution for this, and I think it's absolutely worth breaking this one particular roundtrip use case in favor of streamlining the experience for everything else. As far as I can tell, the issue only arises in this quite specific setup:

workspace built / edited in JSON format (otherwise parameters would not have spaces anyway),
translated to xml+ROOT (this can emit a suitable warning, and auto-convert spaces to underlines or similar),
translated back to JSON.

The extent to which things "break" is that parameter names change automatically (or have to be changed manually), and that is only relevant if they contain a space, which presumably is very rare to begin with. Given that this can be caught during conversion and the user can be informed, I think this is quite minor.

alexander-held · 2022-03-21T10:54:15Z

I tried out a roundtrip from JSON with a POI called Signal norm. Conversion to xml is fine, and xml back to JSON recovers the original spec. A problem only arises with hist2workspace, which looks for two POIs in that setup and cannot find either of them:

WARNING: Can't find parameter of interest: Signal in Workspace. Not setting in ModelConfig.
WARNING: Can't find parameter of interest: norm in Workspace. Not setting in ModelConfig.

Along similar lines, I tried out adding an emoji to a normfactor name: no obvious errors, but in the json2xml -> xml2json roundtrip there's some (presumably encoding-related) slight change: 😀 becomes \ud83d\ude00. (I'm not trying to argue that support for this kind of stuff is needed, but perhaps it may be useful to add a projection to json2xml when encountering Unicode characters, within pyhf itself I see no issues.)

When adding the same emoji to a nuisance parameter name (which has an associated histogram), json2xml works fine, but neither xml2json nor hist2workspace work. That might be a lack of support in ROOT though, since the errors seem to be related to finding the histogram in the .root file.

(edit: the last point was unrelated to pyhf, see scikit-hep/uproot5#576)

amartyarej · 2022-07-14T15:15:15Z

Hi,

Can you please confirm if in the current version, multiple POI workspace can be converted to pyhf in JSON format as desired by default?
Given that currently many analyses are using profile likelihood unfolding and some are published, it will be good to have this feature for likelihood publication.

Regards,
Amartya

kratsg · 2022-07-14T15:53:23Z

pyhf xml2json functionality works normally in the JSON format. It's just not something that's implemented in terms of the functionality. The serialization should be ok.

amartyarej · 2022-07-14T17:14:58Z

I see. Thanks!

alexander-held · 2022-07-14T17:17:16Z

The "as desired" part is arguable, I would personally find a format where the POI entry is a list of strings better in the longer term. This is a breaking change however. Right now you will get a single string with spaces between POIs like

"poi": "poi_1 poi_2 poi_3"

which pyhf (as of the current implementation) will fail to convert into a model.

You could consider only specifying a single POI in the model, as this is technically metadata without immediate impact on inference.

If there will be a move to a list format ["poi_1", "poi_2", "poi_3"] at some point in the future, that would either require a workspace update or be handled in the pyhf implementation of reading the workspace information (which is tricker since currently whitespaces are supported in POI names, causing possible ambiguities).

kratsg · 2022-07-14T21:53:33Z

@amartyarej how quick of a timescale are you asking for? We have a ton of changes in the pipeline.

amartyarej · 2022-07-15T15:43:46Z

We are planning to finalize our studies by end of August

matthewfeickert added the feat/enhancement New feature or request label Jun 7, 2018

alexander-held mentioned this issue Mar 21, 2022

Name validity checks in pyhf json2xml for ROOT compatibility #1267

Open

kratsg added this to pyhf v0.8.0 Aug 30, 2022

kratsg moved this to To do in pyhf v0.8.0 Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for multiple parameters of interest #179

Add support for multiple parameters of interest #179

lukasheinrich commented May 23, 2018

swertz commented Dec 17, 2019

lukasheinrich commented Dec 17, 2019

swertz commented Dec 17, 2019

mswiatlo commented May 4, 2020

matthewfeickert commented May 4, 2020 •

edited

Loading

alexander-held commented Jul 8, 2020

kratsg commented Jul 26, 2020

kratsg commented Sep 9, 2020

alexander-held commented Oct 21, 2020

jagrundy commented Feb 16, 2021

lukasheinrich commented Feb 16, 2021

jagrundy commented Feb 16, 2021

kratsg commented Feb 16, 2021

matthewfeickert commented Feb 16, 2021 •

edited

Loading

alexander-held commented Feb 16, 2021 •

edited

Loading

Sally27 commented Mar 14, 2022

kratsg commented Mar 15, 2022

kratsg commented Mar 15, 2022

Sally27 commented Mar 21, 2022

kratsg commented Mar 21, 2022

alexander-held commented Mar 21, 2022

alexander-held commented Mar 21, 2022 •

edited

Loading

amartyarej commented Jul 14, 2022

kratsg commented Jul 14, 2022

amartyarej commented Jul 14, 2022

alexander-held commented Jul 14, 2022

kratsg commented Jul 14, 2022

amartyarej commented Jul 15, 2022

Add support for multiple parameters of interest #179

Add support for multiple parameters of interest #179

Comments

lukasheinrich commented May 23, 2018

Description

swertz commented Dec 17, 2019

lukasheinrich commented Dec 17, 2019

swertz commented Dec 17, 2019

mswiatlo commented May 4, 2020

matthewfeickert commented May 4, 2020 • edited Loading

alexander-held commented Jul 8, 2020

kratsg commented Jul 26, 2020

kratsg commented Sep 9, 2020

alexander-held commented Oct 21, 2020

jagrundy commented Feb 16, 2021

lukasheinrich commented Feb 16, 2021

jagrundy commented Feb 16, 2021

kratsg commented Feb 16, 2021

matthewfeickert commented Feb 16, 2021 • edited Loading

alexander-held commented Feb 16, 2021 • edited Loading

Sally27 commented Mar 14, 2022

kratsg commented Mar 15, 2022

kratsg commented Mar 15, 2022

Sally27 commented Mar 21, 2022

kratsg commented Mar 21, 2022

alexander-held commented Mar 21, 2022

alexander-held commented Mar 21, 2022 • edited Loading

amartyarej commented Jul 14, 2022

kratsg commented Jul 14, 2022

amartyarej commented Jul 14, 2022

alexander-held commented Jul 14, 2022

kratsg commented Jul 14, 2022

amartyarej commented Jul 15, 2022

matthewfeickert commented May 4, 2020 •

edited

Loading

matthewfeickert commented Feb 16, 2021 •

edited

Loading

alexander-held commented Feb 16, 2021 •

edited

Loading

alexander-held commented Mar 21, 2022 •

edited

Loading