-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set default precisionStep for NumericField and NumericRangeFilter [LUCENE-1712] #2786
Comments
Uwe Schindler (@uschindler) (migrated from JIRA) Should be: doc.add(new NumericField("price").setFloatValue(15.50f)); With a direct constructor, there would be the problem of missing data type information. |
Michael McCandless (@mikemccand) (migrated from JIRA)
Sorry, what does that mean? |
Uwe Schindler (@uschindler) (migrated from JIRA) The methods setFloatValue() etc are for specifying the exact data type. If you would do it in the constructor, the resulting code would be very error-prone. e.g., new NumericField("price", 12.5): Does this mean 12.5 as double or float, is 15 alone meant as (in future) short, byte or int? OK users can add "f" in the float case to it, but this makes it very hard to prevent errors, because Java automatically casts all numeric types to each other suddenly. This is why I added these factories for NumericRangeQuery and the setters here. |
Michael McCandless (@mikemccand) (migrated from JIRA) OK, I agree it's dangerous to let javac auto cast. So, can we set an across the board default of 4? |
Earwin Burrfoot (migrated from JIRA) Am I misunderstanding something or the problem still persists? |
Uwe Schindler (@uschindler) (migrated from JIRA) I think this is simple, we just overload all ctors and factories and remove the precisionStep there. These methods use NumericUtils.DEFAULT_PRECISION_STEP = 4 then. But thee should be a clear not, to do also tests with other step values. additional variants:
I am currentlly not sure (I was thinking the whole time during including into core) to also make NumericRangeQuery work like the other two classes: generic Constructor without datatype and then set the range explicit: new NumericRangeQuery(fieldname[, precisionStep]).setFloatRange(....) Not sure, in this case the API where similar and I have to override only one ctor for different construction parameters. My only problem is, that Queries are normally almost everywhere in Lucene static and unmodifable (beyond boost). |
Uwe Schindler (@uschindler) (migrated from JIRA)
Float are indexed like ints and doubles like longs. The problem here is more that if you would specify the value direct in the constructor, you cannot for sure always give the right type (because Java auto-casts). This is why I have these setFloatValue(), setLongValue() and so on. |
Earwin Burrfoot (migrated from JIRA) Aha! And each time you invoke setFloatValue/setDoubleValue it switches base type behind the scenes? Eeek. |
Michael McCandless (@mikemccand) (migrated from JIRA)
I think that's an acceptable risk. I suppose we could add checking to catch you but I don't think that's needed (we should document clearly that you can't "mix types"). |
Chris M. Hostetter (@hossman) (migrated from JIRA) behind the scenes precision changes based on which set*Value() method is called smells really wrong. I'm not overly familiar with NumericField, but i'm i'm understanding the current suggestion, wouldn't that mkae situations like this come up... NumericField a = new NumericField("price", MY_CUSTOM_PRECISION_STEP, ...);
a.setFloatValue(23.4f); // blows away my custom precision
NumericField b = new NumericField("price", ...);
b.setPrecisionStep(MY_CUSTOM_PRECISION_STEP);
b.setFloatValue(23.4f); // blows away my custom precision
NumericField c = new NumericField("price", ...);
c.setFloatValue(23.4f);
c.setPrecisionStep(MY_CUSTOM_PRECISION_STEP); // only way to get my value used ...that seems sketchy, and really anoying if people try reusing NumericField instances. If the goal is to have good "defaults" based on type then why not just have a constant per type that people can refer to explicitly? if they don't know what number to pick ... as well as a true "default" if they pick nothing. int DEFAULT_STEP = ...;
int SUGGESTED_INT_STEP = ...;
int SUGGESTED_FLOAT_STEP = ...; |
Uwe Schindler (@uschindler) (migrated from JIRA) In my opinion, I would like to keep the precisionStep parameter required and give the 4 constants for each data type in NumericUtils. On the other hand 4 is a (maybe) good default, so I propose, that all ctors/factories getting a precisionStep default it to 4, if left out. precisionStep is a final variable in NumericTokenStream (and so in NumericField), because it does not make sense to change it. If "field" is final, also precisionStep should be final (one field must always use the same precision step). In principle Mike is right, the type is also fixed after first calling setXxxValue, so I could throw an IAE, if somebody calles a setter for a different datatype after the first one. A IllegalStateEx is thrown, when the value was not initialized and the docinverter tries to use the token stream. Here are two ideas to fix this the defaultPrecStep per type:
The javadocs should always clearly note, that one should check out a good precStep. — |
Earwin Burrfoot (migrated from JIRA) Having half of your methods constantly fail with an exception depending on constructor parameter. That just screams - "Split me into two classes!" |
Uwe Schindler (@uschindler) (migrated from JIRA) Four classes! And with #2784 there will be six! Not a good idea. 6 classes for NumericTokenStream, 6 for NumericField and maybe 6 for NumRangeQuery/Filter. brrrrr |
Uwe Schindler (@uschindler) (migrated from JIRA) ...and unmaintainable. I merged the two classes from contrib because of this. It was just duplicate code with some small variances. Always a problem for copy/paste operations. |
Michael McCandless (@mikemccand) (migrated from JIRA)
+1, with javadocs encouraging experimentation. I think the ideas to conditionalize the default according to type add
Let's be sure to call out this use-case in the javadocs! |
Chris M. Hostetter (@hossman) (migrated from JIRA)
This screams out for additional (redundant) constants that are self documenting in their names... int PRECISION_STEP_DEFAULT = 4; // i think?
int PRECISION_STEP_SUGGESTED_INT_RANGEANDSORT = ...; // no idea what this should be
int PRECISION_STEP_SUGGESTED_INT_SORTONLY = 32; // i think?
int PRECISION_STEP_SUGGESTED_FLOAT_RANGEANDSORT = ...; // no idea what this should be
int PRECISION_STEP_SUGGESTED_FLOAT_SORTONLY = 32; // i think?
... |
Michael McCandless (@mikemccand) (migrated from JIRA) How about we add PRECISION_STEP_DEFAULT=4, make that the default for all types, and then note in the javadocs the "interesting" values for precision step (ie for sorting only)? |
Michael McCandless (@mikemccand) (migrated from JIRA) I'm assuming this one is yours Uwe! |
Uwe Schindler (@uschindler) (migrated from JIRA) Attached is a patch with the default precisionStep of 4. The javadocs of NumericRangeQuery list all possible and senseful values. This patch also contains some cleanup in NumericUtils (rename constants) and a lot of other JavaDocs fixes. It also changes the token types of the TokenStream (no difference between 32/64 bit vals needed) and adds a test for them. |
Michael McCandless (@mikemccand) (migrated from JIRA) Patch looks good Uwe! |
Uwe Schindler (@uschindler) (migrated from JIRA) Some updates:
|
Uwe Schindler (@uschindler) (migrated from JIRA) Committed revision: 793823 Thanks Mike! |
This is a spinoff from #2775.
A user using Numeric* should not need to understand what's
"under the hood" in order to do their indexing & searching.
They should be able to simply:
And have a decent default precisionStep selected for them.
Actually, if we add ctors to NumericField for each of the supported
types (so the above code works), we can set the default per-type. I
think we should do that?
4 for int and 6 for long was proposed as good defaults.
The default need not be "perfect", as advanced users can always
optimize their precisionStep, and for users experiencing slow
RangeQuery performance, NumericRangeQuery with any of the defaults we
are discussing will be much faster.
Migrated from LUCENE-1712 by Michael McCandless (@mikemccand), resolved Jul 14 2009
Attachments: LUCENE-1712.patch (versions: 2)
The text was updated successfully, but these errors were encountered: