Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix calculation of curve fit weights #1224

Merged

Conversation

nkanazawa1989
Copy link
Collaborator

Summary

This PR updates calculation of weights to compute residual in the curve fitting.

Details and comments

When the error bar of data points is significantly small, these data points become a dominant source of residual to minimize. This means other data points contribute little to the fit, and causes local overfit to certain data points. This is fixed by clipping the weights to remove outlier.

@nkanazawa1989 nkanazawa1989 added backport stable potential The issue or PR might be minimal and/or import enough to backport to stable Changelog: Bugfix Include in the "Fixed" section of the changelog labels Jul 12, 2023
Copy link
Collaborator

@coruscating coruscating left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix. I tried on some example data with yerr set artificially low, and the fitting is much improved. The 90th percentile clipping of outlier weights seems reasonable. I wonder if #1107 might be solved by this fix, although previously, the infinite weights should have been handled by LMFIT so I'm still not sure what's causing that random error.

@CLAassistant
Copy link

CLAassistant commented Jul 18, 2023

CLA assistant check
All committers have signed the CLA.

@itoko
Copy link
Contributor

itoko commented Jul 19, 2023

The 90th percentile clipping feels a bit ad-hoc to me. Do we really need to use WLE (weighted least square) for fitting in general? Could you give us some background why we decided to use WLE instead of OLE (ordinary least square)?
(c.f. https://www.itl.nist.gov/div898/handbook/pmd/section1/pmd143.htm)

@nkanazawa1989
Copy link
Collaborator Author

nkanazawa1989 commented Jul 19, 2023

In the backend code this is not considered. However in QE we can precisely compute error propagation during the data processing and formatting thanks to the uncertainties package, and we decided to consider this error information also in the fitting. This also impacts the chisq of the fitting.

Another example is RB. This experiment uses sample average over the seeds (rather than sampling error), and the survival probability tends to diverge in shorter Clifford lengths due to deviation of the total counts of the physical gates of the error source, while it converges to a particular P1 at the tale. So I think weighted least square can add more weights to the tale, yielding better estimation of the exponent.

@nkanazawa1989 nkanazawa1989 added this pull request to the merge queue Aug 15, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 15, 2023
@nkanazawa1989 nkanazawa1989 added this to the Release 0.6 milestone Aug 18, 2023
@coruscating coruscating added this pull request to the merge queue Aug 29, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 29, 2023
@nkanazawa1989 nkanazawa1989 added this pull request to the merge queue Aug 31, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 31, 2023
@nkanazawa1989 nkanazawa1989 force-pushed the fix/handling_of_zero_yerr branch from bc5aa54 to 88b8086 Compare August 31, 2023 06:39
@nkanazawa1989 nkanazawa1989 added this pull request to the merge queue Aug 31, 2023
Merged via the queue into qiskit-community:main with commit 6a06e74 Aug 31, 2023
mergify bot pushed a commit that referenced this pull request Aug 31, 2023
### Summary

This PR updates calculation of weights to compute residual in the curve
fitting.

### Details and comments

When the error bar of data points is significantly small, these data
points become a dominant source of residual to minimize. This means
other data points contribute little to the fit, and causes local overfit
to certain data points. This is fixed by clipping the weights to remove
outlier.

(cherry picked from commit 6a06e74)
wshanks pushed a commit that referenced this pull request Sep 5, 2023
This is an automatic backport of pull request #1224 done by
[Mergify](https://mergify.com).

Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com>
@wshanks
Copy link
Collaborator

wshanks commented Sep 5, 2023

@itoko There was some related discussion in #417 and #939. I tend to agree with you that the weights are perhaps not doing enough good to be worth using. This PR clipping the weights to the 90th percentile might not be much different from using ordinary least squares (it would be interesting to compare). It might be worth continuing a discussion about weights and fit quality somewhere. I think part of the issue that this PR is addressing is that our fit models are not infinitely accurate. There may be small deviations in the dependence on the independent variables from the analytic functions. When the weights are all about even, these deviations do not matter much for a fit to parameters within a few percent. Assuming the binomial distribution though we can get quite small errors and quite large weights and then the curve fitting routines are very punishing for small deviations from the model. Part of what makes me say this is that for a sample with good SNR when it gives near all 0 or all 1 counts I trust that the result really is close to 0 or 1 with small uncertainty, but I have seen that give poor chi squared for data sets that look pretty reasonable by eye.

nkanazawa1989 added a commit to nkanazawa1989/qiskit-experiments that referenced this pull request Jan 10, 2024
### Summary

This PR updates calculation of weights to compute residual in the curve
fitting.

### Details and comments

When the error bar of data points is significantly small, these data
points become a dominant source of residual to minimize. This means
other data points contribute little to the fit, and causes local overfit
to certain data points. This is fixed by clipping the weights to remove
outlier.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport stable potential The issue or PR might be minimal and/or import enough to backport to stable Changelog: Bugfix Include in the "Fixed" section of the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants