Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make histogramQuantile handle case of zero samples #5419

Merged
merged 2 commits into from
May 26, 2023

Conversation

wolffcm
Copy link

@wolffcm wolffcm commented May 26, 2023

Closes #5415

When there are no observations/samples in a histogram (all zeros for each bucket) produce a null value.

Checklist

Dear Author 👋, the following checks should be completed (or explicitly dismissed) before merging.

  • ✏️ Write a PR description, regardless of triviality, to include the value of this PR
  • 🔗 Reference related issues
  • 🏃 Test cases are included to exercise the new code
  • 🧪 If new packages are being introduced to stdlib, link to Working Group discussion notes and ensure it lands under experimental/
  • 📖 If language features are changing, ensure docs/Spec.md has been updated

Dear Reviewer(s) 👋, you are responsible (among others) for ensuring the completeness and quality of the above before approval.

Co-authored-by: Gavin Cabbage <gavincabbage@users.noreply.github.com>
return true
}

func (t *histogramQuantileTransformation) computeQuantile(cdf []bucket) (quantileResult, error) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here was that the Flux stdlib function histogram_quantile gets a wrong answer for some input data.

This function accepts a cumulative distribution function (a cumulative histogram produced from the input table data) and produces the requested quantile.

When the cdf contains all zeroes, this function would return the bound of the last histogram bucket, which is incorrect. The right thing to do for that case is to return a null value, since we can't compute a quantile if we didn't actually receive any observations.

// "force" is not possible because isMonotonic will fix the buckets
return quantileResult{}, errors.Newf(codes.Internal, "unknown or unexpected value for onNonmonotonic: %q", t.spec.OnNonmonotonic)
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the histogram buckets are not monotonic (which they should be if they are cumulative) due to late-arriving data on the edge. The OnNonmonotonic parameter describes what to do in this case.

Checking for monotonicity first (and fixing if needed and requested by the user) avoids a bug that occurred when the total observation count was pulled from the last bucket before it was "fixed" in the case of forcing monotonicity.

This is not really related to the issue the user found but I saw it here and fixed it. The test case histogramQuantileOnNonmonotonicForceLastBucket below verifies this fix.

if totalCount == 0 {
// Produce a null value if there were no samples
return quantileResult{action: appendNil}, nil
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where we bail and produce a null value for the case of zero observations.

@wolffcm wolffcm merged commit d8995bb into master May 26, 2023
@wolffcm wolffcm deleted the fix/histogram-quantile branch May 26, 2023 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The histogramQuantile function returns an incorrect value when there are no observations in the histogram
2 participants