LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations #848

gsmiller · 2022-04-27T20:31:32Z

Description

Fix floating point precision issues in TestTaxonomyFacetAssociations

Solution

Ensure we sum the floats in the exact same order when determining our "expected" value and when determining the actual value so we no longer need to rely on delta tolerances

Tests

All changes were to a test

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.

…Associations

gsmiller · 2022-04-27T21:06:52Z

OK, shoot. Looks like this didn't quite work. Digging into the failing test.

…dren are visited is a pretty internal implmentation detail of faceting

gautamworah96 · 2022-04-27T23:25:32Z

Hmm. I took a look at the error in the LUCENE-10529 JIRA and the delta seems to be more than 1 (and this PR is setting it to 1f?)

org.apache.lucene.facet.taxonomy.TestTaxonomyFacetAssociations > testFloatAssociationRandom FAILED
    java.lang.AssertionError: expected:<2605996.5> but was:<2605995.2>

or is the thinking here that after getting the order right, the delta will not be greater than 1?

Maybe we could beast this test case for some large num iterations and then set an acceptable delta limit? I'll take a look at some other test cases in the meantime. I have a feeling that this is not the first time this library has dealt with this problem.. :)

gsmiller · 2022-04-28T13:12:58Z

@gautamworah96 thanks for looking! The mismatch in the original Jira was a slightly different check, which is the one now set to a delta of 0. The fix proposed here is to ensure the floats are summed in the exact same order, which leads to identical results. The mismatch originated because the floats were being summed in different orders. Instead of chasing a delta that will work for "all" cases (which may be difficult since the test is random), I propose fixing it in this way.

There's still a different check that does require a delta because it's difficult to ensure the test case can match the same ordering used internally by the faceting implementation. That's the one that you might be seeing that's still set to 1? I think this is acceptable because it's only 3 floats being summed in this case and the "drift" will be much smaller.

vigyasharma

So earlier we were comparing an aggregation done on the order in which docs were added (using the ArrayList), to the faceting aggregation, which runs in DISI order (Index Sort Order?). These orders are expected to be different, and can thus, sometimes, cause higher than expected floating point drift.

This makes sense, thanks @gsmiller for fixing this at the root.

vigyasharma · 2022-04-28T22:25:37Z

lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyFacetAssociations.java

-      // in the same order when determining expected values and when computing facets. See
-      // LUCENE-10530:
-      assertEquals(aggregatedValue, facetResult.value.floatValue(), 0f);
+      assertEquals(aggregatedValue, facetResult.value.floatValue(), 1f);


+1, I was going to comment this in the last iteration. There may also be some internal drift in the way aggregation function works on floats. Found a nice read on it here.

Thanks for sharing!

gsmiller · 2022-04-29T01:12:39Z

@vigyasharma yeah exactly. So the faceting implementation was summing the doc values in a different order from the test case. I wish I could do something similar for the overall aggregation value as well, but it depends on the order that siblings are vended by the taxonomy index. While it's true today that it's reverse-ordinal order, that seems pretty fragile to rely on. I'm more confident relying on the fact that documents are visited in the order they are collected since that feels pretty stable to the internal workings of Lucene's query evaluation.

gsmiller · 2022-04-29T15:57:37Z

Since it doesn't look like there's any opposition to the approach I took in fixing this, and since this is a test case bug that is occasionally showing up and would be good to address soon, I'm going to go ahead and merge this fix. Happy to rework the solution if anyone has additional feedback. Thanks!

* main: (24 commits) LUCENE-10532: remove @Slow annotation (apache#832) LUCENE-10312: Add PersianStemmer (apache#540) LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries (main branch) (apache#871) LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues (apache#869) Disable liftbot, we have our own tools LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (apache#860) Make CONTRIBUTING.md a bit more succinct (apache#866) LUCENE-10504: KnnGraphTester to use KnnVectorQuery (apache#796) Add change line for LUCENE-9848 LUCENE-9848 Sort HNSW graph neighbors for construction (apache#862) LUCENE-10524 Add benchmark suite details to CONTRIBUTING.md (apache#853) LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode (apache#859) LUCENE-10534: MinFloatFunction / MaxFloatFunction calls exists twice (apache#837) LUCENE-10188: Give SortedSetDocValues a docValueCount() (apache#663) Allow to link to github PR from changes (apache#854) LUCENE-10551: improve testing of LowercaseAsciiCompression (apache#858) LUCENE-10542: FieldSource exists implementations can avoid value retrieval (apache#847) LUCENE-10539: Return a stream of completions from FSTCompletion. (apache#844) gradle 7.3.3 quick upgrade (apache#856) LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations (apache#848) ...

LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacet…

4756a67

…Associations

need to still use a delta on the aggregate value since the order chil…

3cb7c68

…dren are visited is a pretty internal implmentation detail of faceting

vigyasharma reviewed Apr 28, 2022

View reviewed changes

gsmiller merged commit 902a7df into apache:main Apr 29, 2022

gsmiller deleted the LUCENE-10530-taxo-facet-test-float-bug branch April 29, 2022 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations #848

LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations #848

gsmiller commented Apr 27, 2022

gsmiller commented Apr 27, 2022

gautamworah96 commented Apr 27, 2022 •

edited

Loading

gsmiller commented Apr 28, 2022

vigyasharma left a comment

vigyasharma Apr 28, 2022

gsmiller Apr 29, 2022

gsmiller commented Apr 29, 2022

gsmiller commented Apr 29, 2022

LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations #848

LUCENE-10530: Avoid floating point precision bug in TestTaxonomyFacetAssociations #848

Conversation

gsmiller commented Apr 27, 2022

Description

Solution

Tests

Checklist

gsmiller commented Apr 27, 2022

gautamworah96 commented Apr 27, 2022 • edited Loading

gsmiller commented Apr 28, 2022

vigyasharma left a comment

Choose a reason for hiding this comment

vigyasharma Apr 28, 2022

Choose a reason for hiding this comment

gsmiller Apr 29, 2022

Choose a reason for hiding this comment

gsmiller commented Apr 29, 2022

gsmiller commented Apr 29, 2022

gautamworah96 commented Apr 27, 2022 •

edited

Loading