Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations #848

Merged
merged 2 commits into from
Apr 29, 2022

Conversation

gsmiller
Copy link
Contributor

Description

Fix floating point precision issues in TestTaxonomyFacetAssociations

Solution

Ensure we sum the floats in the exact same order when determining our "expected" value and when determining the actual value so we no longer need to rely on delta tolerances

Tests

All changes were to a test

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

@gsmiller
Copy link
Contributor Author

OK, shoot. Looks like this didn't quite work. Digging into the failing test.

…dren are visited is a pretty internal implmentation detail of faceting
@gautamworah96
Copy link
Contributor

gautamworah96 commented Apr 27, 2022

Hmm. I took a look at the error in the LUCENE-10529 JIRA and the delta seems to be more than 1 (and this PR is setting it to 1f?)

org.apache.lucene.facet.taxonomy.TestTaxonomyFacetAssociations > testFloatAssociationRandom FAILED
    java.lang.AssertionError: expected:<2605996.5> but was:<2605995.2>

or is the thinking here that after getting the order right, the delta will not be greater than 1?

Maybe we could beast this test case for some large num iterations and then set an acceptable delta limit? I'll take a look at some other test cases in the meantime. I have a feeling that this is not the first time this library has dealt with this problem.. :)

@gsmiller
Copy link
Contributor Author

@gautamworah96 thanks for looking! The mismatch in the original Jira was a slightly different check, which is the one now set to a delta of 0. The fix proposed here is to ensure the floats are summed in the exact same order, which leads to identical results. The mismatch originated because the floats were being summed in different orders. Instead of chasing a delta that will work for "all" cases (which may be difficult since the test is random), I propose fixing it in this way.

There's still a different check that does require a delta because it's difficult to ensure the test case can match the same ordering used internally by the faceting implementation. That's the one that you might be seeing that's still set to 1? I think this is acceptable because it's only 3 floats being summed in this case and the "drift" will be much smaller.

Copy link
Contributor

@vigyasharma vigyasharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So earlier we were comparing an aggregation done on the order in which docs were added (using the ArrayList), to the faceting aggregation, which runs in DISI order (Index Sort Order?). These orders are expected to be different, and can thus, sometimes, cause higher than expected floating point drift.

This makes sense, thanks @gsmiller for fixing this at the root.

// in the same order when determining expected values and when computing facets. See
// LUCENE-10530:
assertEquals(aggregatedValue, facetResult.value.floatValue(), 0f);
assertEquals(aggregatedValue, facetResult.value.floatValue(), 1f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I was going to comment this in the last iteration. There may also be some internal drift in the way aggregation function works on floats. Found a nice read on it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing!

@gsmiller
Copy link
Contributor Author

@vigyasharma yeah exactly. So the faceting implementation was summing the doc values in a different order from the test case. I wish I could do something similar for the overall aggregation value as well, but it depends on the order that siblings are vended by the taxonomy index. While it's true today that it's reverse-ordinal order, that seems pretty fragile to rely on. I'm more confident relying on the fact that documents are visited in the order they are collected since that feels pretty stable to the internal workings of Lucene's query evaluation.

@gsmiller
Copy link
Contributor Author

Since it doesn't look like there's any opposition to the approach I took in fixing this, and since this is a test case bug that is occasionally showing up and would be good to address soon, I'm going to go ahead and merge this fix. Happy to rework the solution if anyone has additional feedback. Thanks!

@gsmiller gsmiller merged commit 902a7df into apache:main Apr 29, 2022
@gsmiller gsmiller deleted the LUCENE-10530-taxo-facet-test-float-bug branch April 29, 2022 15:57
wjp719 added a commit to wjp719/lucene that referenced this pull request May 10, 2022
* main: (24 commits)
  LUCENE-10532: remove @Slow annotation (apache#832)
  LUCENE-10312: Add PersianStemmer (apache#540)
  LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries (main branch) (apache#871)
  LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues (apache#869)
  Disable liftbot, we have our own tools
  LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (apache#860)
  Make CONTRIBUTING.md a bit more succinct (apache#866)
  LUCENE-10504: KnnGraphTester to use KnnVectorQuery (apache#796)
  Add change line for LUCENE-9848
  LUCENE-9848 Sort HNSW graph neighbors for construction (apache#862)
  LUCENE-10524 Add benchmark suite details to CONTRIBUTING.md (apache#853)
  LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode (apache#859)
  LUCENE-10534: MinFloatFunction / MaxFloatFunction calls exists twice (apache#837)
  LUCENE-10188: Give SortedSetDocValues a docValueCount() (apache#663)
  Allow to link to github PR from changes (apache#854)
  LUCENE-10551: improve testing of LowercaseAsciiCompression (apache#858)
  LUCENE-10542: FieldSource exists implementations can avoid value retrieval (apache#847)
  LUCENE-10539: Return a stream of completions from FSTCompletion. (apache#844)
  gradle 7.3.3 quick upgrade (apache#856)
  LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations (apache#848)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants