-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NumericField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache [LUCENE-1701] #2775
Comments
Earwin Burrfoot (migrated from JIRA) I vote for factories - escaping back-compat woes by exposing minimum interface. |
Michael McCandless (@mikemccand) (migrated from JIRA) Uwe can you also open an issue for handling byte/short/Date with
By this same logic, should we remove NumericRangeFilter/Query and use We can't let fear of our back-compat policies prevent progress. I seem to be the only one [who's speaking up, at least] who feels Here's my reasoning: numeric fields are common; many apps need them. For the longest time Lucene could not provide good ootb handling of Such an important & useful functionality deserves a consumable API. Document doc = new Document();
doc.add(new NumericField("price", 15.50f)); not this: Document doc = new Document();
Field f = new Field("price", new NumericTokenStream(4).setFloatValue(15.50f));
f.setOmitNorms(true);
f.setOmitTermFreqAndPositions(true);
doc.add(field); nor, this: Document doc = new Document();
doc.add(NumericUtils.createFloatField("price", 15.50f)); When I want to reuse, I should be able to call In fact, as a user of this API, I shouldn't even have to know that a NumericUtils should be utility methods used only by the current Here's what I propose:
Why should we make such an excellent addition to Lucene, only to make |
Uwe Schindler (@uschindler) (migrated from JIRA) But the same problem like with NumericTokenStream affects also NumericField, because of type safety it will only work with a setXxxValue (if not factory), e.g. doc.add(new NumericField("price", precisionStep).setFloatValue(15.50f)); This code is not shorter than: doc.add(NumericUtils.newFloatField("price", precisionStep, 15.50f)); Additionally with #2773, we could also add Field.Store.XXX to the factory/ctor. OK, the factory solution has the problem, that you cannot reuse the field for effectiveness, so this is an argument for the extra class, that has setXxXValue(). For SortField: The factory code inside NumericUtils is only one Line, you only create a conventional SortField with a specific parser. If we do not want to have the factory in NumericUtils, I could also add an additional ctor option to the normal sortfield (which is still there: it takes the parser, LUCENE-1478). When all parsers are central in the FieldCache, one can create a SortField with one line of code (the current factory demonstrates this). |
Earwin Burrfoot (migrated from JIRA) Mike, I very much agree with everything you said, except "factory is less consumable than constructor" and "add stuff to index to handle NumericField". Out of your three examples the second one is bad, no questions. But first and last are absolutely equal in terms of consumability. If we add some generic storable flags for Lucene fields, this is cool (probably), NumericField can then capitalize on it, as well as users writing their own NNNFields.
I do use factory methods for all my queries and filters, and it makes me feel warm and fuzzy! :) Under the hood some of them consult FieldInfo to instantiate custom-tailored query variants, so I just use range(CREATION_TIME, from, to) and don't think if this field is trie-encoded or raw. "Simple things should be simple", okay. Complex things should be simple too, argh! :) |
Uwe Schindler (@uschindler) (migrated from JIRA) Here is a first draft of NumericField with the same handling as NumericTokenStream. It is for indexing only, on retrieving stored fields, one would get the numeric field value as a string (according to Number.toString()). Because of this, this class returns in stringValue() the string representation of the numeric value and with tokenStreamValue the NumericTokenStream is returned. For SortField I still have the very strong opinion, that here a extra class is not needed. A factory is enough (and even too much, supplying the Parser to SortField would be enough). I will later post a patch with this file and the moved/made public Parsers. |
Yonik Seeley (@yonik) (migrated from JIRA) Having the trie parsers public is good (or public factory method(s) to get the right parser given a set of trie params), but shouldn't they stay with the trie classes? Or am I misunderstanding where you are proposing to move the parsers? |
Uwe Schindler (@uschindler) (migrated from JIRA) Yonik, I will explain my intention: SortFields can then simply created in the Following way: new SortField(field, FieldCache.PLAIN_TEXT_INT_PARSER) for a conventional int field (like with SortField.INT) or new SortField(field, FieldCache.NUMERIC_INT_PARSER). giving a NULL parser still does the same as before, it uses FieldCache.PLAIN_TEXT_INT_PARSER implicitely. |
Yonik Seeley (@yonik) (migrated from JIRA) Regardless of the fact that plain_int parser is on FieldCache, it still doesn't seem like we should add parsers to FieldCache for every field type. It's also the case that a single static parser won't be able to handle all the cases... consider future functionality of using positions or payloads to fill out the full value. A factory allows you to return the correct implementation given the parameters (number of bits to store as position, etc). |
Uwe Schindler (@uschindler) (migrated from JIRA) When this comes (payloads, CSF,...) we will have the new #1906 field cache, where we will have a ValueSource-like thing. Until then, a static parser is enough, and the static parser is still in NumericUtils! I want to move it because of this hack with the unchecked exception. When the new FieldCache is alive this is all nonsense. |
Yonik Seeley (@yonik) (migrated from JIRA) The exception certainly is a hack - but any new field cache API should be powerful enough to handle trie through the normal APIs that anyone else would have to go through when implementing their own field type. If Trie needs to be part of the new field cache implementation, that would be a big red flag. |
Uwe Schindler (@uschindler) (migrated from JIRA)
This is true, the #1906 patch contains a TrieValueSource for that (but it does not apply anymore, as contrib/search/trie is no longer available). Because of this, it can be implemented very cleanly. For easy usage, I would simply suggest to have the parsers for the current plain text and trie field cache implementation public available as singletons, very simple and hurts nobody. I will post a patch for the whole case soon. NumericField will be tested in the NumericRangeQuery index creation (which is currently very ugly, like meikes comments about the code and reusing Fields/TokenStreams, with NumericField it looks like any other index code). For Solr (SOLR-940), there is not need to use NumericField (which is just a helper), the code of TrieField can stay as it is, only some renamings and so on. It just presents a TokenStream to the underlying Solr indexer, as before. |
Michael McCandless (@mikemccand) (migrated from JIRA)
Looks good Uwe! Is there an OK default for precisionStep so we don't have |
Michael McCandless (@mikemccand) (migrated from JIRA)
Coolness is in the eye of the beholder? Yes, they are cool in that they give the developer (us) future Static factory classes are a good fit when the impls really should But NumericField vs Field, and SortField vs NumericSortField, are
+1 Wanna make a patch? Then NumericField would just tap in to this extensibility... and,
Because.... we've decided that this is our core approach to numerics? Seriously, I don't see that as unfair. Trie works well. We have Sure, we should make it easy (add extensibility) so external fields
Someday maybe I'll convince you to donate this "schema" layer on top
Whoa, this is all simple stuff? What should be complex about using |
Yonik Seeley (@yonik) (migrated from JIRA)
We decided to move trie from contrib to core because it was the most stable and usable numeric implementation. We did not decide to rewrite it, or make it "special". It's not sufficient for everyone, there will be (many) enhancements to trie, and there will be other numeric field types. Trie is not, and should not be the only numeric field, and should not be baked into the index format. The next step after adding NumericField seems to be "it's a bug if getDocument() doesn't return a NumericField, so we must encode it in the index". If that's the case, I'm -1 on adding NumericField in the first place. |
Michael McCandless (@mikemccand) (migrated from JIRA) I still think we should make NumericSortField strongly typed (not a I think it's far more consumable, from the user's standpoint. I made Ie this: new NumericSortField("price"); is better than this: new SortField("price", FieldCache.NUMERIC_FLOAT_PARSER); or this: SortField.getNumericSortField("price", SortField.FLOAT); Uwe, I agree that if we take the developer's (us) standpoint, the
{Extended,}FieldCache already is the central place that holds parsers |
Yonik Seeley (@yonik) (migrated from JIRA)
Magic. |
Uwe Schindler (@uschindler) (migrated from JIRA) I aggree with Yonik, this is too much magic and would not work. There must be at least the type, which would be SortField.FLOAT or something like that). If you keep that in mind, there is really no difference between: new SortField("price", FieldCache.NUMERIC_FLOAT_PARSER) and new SortField("price", FieldCache.PLAIN_TEXT_FLOAT_PARSER) equivalent to: new SortField("price", SortField.FLOAT) |
Michael McCandless (@mikemccand) (migrated from JIRA)
But trie is the best we have today? And it's sooooo much better than we had before? Prior to trie, with The addition of numeric indexing to Lucene is a major step forward. Anyway, if push comes to shove (which it seems to be doing!), I can |
Michael McCandless (@mikemccand) (migrated from JIRA)
Woops, sorry, you're right: you'd need to specify the type, at least. So.... how about for SortField we make the parser a required arg (as |
Uwe Schindler (@uschindler) (migrated from JIRA) That is what I was talking about all the time! But this is all not really the best solution. It is too bad, that #1906 (not in current form, it is just not discussed to the end) is not to be included into 2.9. I know we will have to stick with parsers until end of days, because the new ValueSource staff and Uninverters was not introduces early enough to be removed with 3.0. And Trie fields with positions/payloads would never work with current FieldCache, so I need no factories. So this would be postponed until this is done. Then we could have a ValueSource for trie fields, where you could add these magic stuff like payloads and so on. But until this comes to reality (including CSF), the static parsers is all we have until now and is best placed in FieldCache (because of the strong linkage with this ugly exception to be hidden to the outside). Earwin, Yonik: I know TrieRange is only one implementation of the whole numeric problem, but none of you ever presented your implementation to the public. This is the best we have now, its included into core and everybody is happy. If you have a better private implementation, you can still use it! |
Michael McCandless (@mikemccand) (migrated from JIRA)
Eh, yeah... somethings things just need a good hashing out! 2.9 is really shaping up to be an awesome release...
Progress not perfection!
Well we are discussing relaxing the back-compat policy... so maybe
We'll cross that bridge when we get there...
I agree.
We shouldn't weaken trie's integration to core just because others |
Earwin Burrfoot (migrated from JIRA)
It's not generic enough to be of use for every user of Lucene, and it doesn't aim to be such. It also evolves, and donating something to Lucene means casting it in concrete. Solr has its own schema approach, and it has its merits and downfalls compared to mine. That's what is nice, we're able to use the same library in differing ways, and it doesn't force its sense of 'best practices' on us.
SOME of them aren't static :-D
You shouldn't integrate into core something that is not core functionality. Think microkernels.
And spend two years deprecating and supporting today's designs after you get a better thing tomorrow. Back-compat Lucene-style and agile design aren't something that marries well.
You're weakening Lucene itself by introducing too much coupling between its components. IndexReader/Writer pair is a good example of what I'm arguing against. A dusty closet of microfeatures that are tightly interwoven into a complex hard-to-maintain mess with zillions of (possibly broken) control paths - remember mutable deletes/norms+clone/reopen permutations? It could be avoided if IR/W were kept to the bare minimum (which most people are going to use), and more advanced features were built on top of it, not in the same place. NRT seems to tread the same path, and I'm not sure it's going to win that much turnaround time after newly-introduced per-segment collection. Some time ago I finished a first version of IR plugins, and enjoy pretty low reopen times (field/facet/filter cache warmups included). (Yes, I'm going to open an issue for plugins once they stabilize enough)
No, I'd like to continue IR cleanup and play with positionIncrement companion value that could enable true multiword synonyms. |
Michael McCandless (@mikemccand) (migrated from JIRA)
Oh, OK.
There's no forcing going on, here. Even had we added the bit into the
Heh.
I agree: if Lucene had all extension points that'd make it possible
We can't let fear of back-compat prevent us from making progress.
Sure, our approach today isn't perfect ("progress not perfection").
I agree, per-segment collection was the bulk of the gains needed for But, not having to write & read deletes to disk, not commit (fsync) And this integration lets us take it a step further with #2390, If you have good simplifications/improvements on the approach here,
I'm confused: I thought that effort was to make SegmentReader's
Well I'm looking forward to seeing your approach on these two! |
Uwe Schindler (@uschindler) (migrated from JIRA) Patch with all changes, including #2761 (it is easier to do this together):
A short note: SortField is only serializable, if all custom comparators used are also serializable, maybe we should also note this in the docs. Parsers are automatically serializable (because superinterface), but not automatically real singletons (but this is not Lucenes problem). |
Uwe Schindler (@uschindler) (migrated from JIRA) I know you will kill me, Yonik, and Mike will love me :-) but there is a possibility to also support Trie fields with standard SortField.XXX constants using autodetection. Trie fields always start with a shift-prefix defining the type and so for sure contain non-digits. So FieldCache could simply test and catch NumberFormatException. So maybe this would be an option, to make the default (parser== null in FieldCaches getInts(),...) detect this automatically. Users then could use SortField/FieldCache as before, ignoring the real encoding. If I would implement this, I could remove the enforcing to parser==null in SortField again and make FieldCache do the detection in this case. |
Uwe Schindler (@uschindler) (migrated from JIRA) The last patch was still not 100% backwards compatible, now it is. The modified test-tag TestExtendedFieldCache shows it, it will be committed to backwards-compatibility branch |
Yonik Seeley (@yonik) (migrated from JIRA) I'll quote myself, and then attempt to not repeat myself further after this point (the back and forth is silly).
Everyone thinks good APIs, good architecture, and good performance is important. It imply otherwise is also silly. |
Michael McCandless (@mikemccand) (migrated from JIRA)
Meaning we wouldn't be forced to specify FieldCache.DEFAULT_INT_PARSER/FieldCache.NUMERIC_UTILS_INT_PARSER when creating SortField or calling FieldCache.getInts? And we'd make the core parsers package private again? So users could simply do I think this is is compelling! Why not take this approach? There would then be no user visible changes to how you sort by numeric fields... |
Uwe Schindler (@uschindler) (migrated from JIRA) Yes, it would work this way. This only would violate Yoniks complaints about not miximg Trie too much into the other code, but this is already done because of this StopFillCacheException usage. When we do #1906, this should be thought about, too. SortField.AUTO is deprecated and will not be changed (only detect text numbers). There should be a note, that it would not work with the "new" NumericFields. I would make the core parsers public to enable users to have full control (on the other hand I could now hide also the trie parsers). But this is a bad approach, wherever automatisms are envolved, oneshould always have the possibility to fix to one parser. And why do we have the SortField/FieldCache accessors with parser parameter, when you cannot even use the default ones? P.S.: About payloads & positions and the need for extra parameters to the parser: After adding support for positions or payloads to encode the highest precision, there is still no need for an extra SortField/Parser class or factory. The future "ValueSource" starts to decode the values until a change in shift occurs. This first shift is for sure the highest precision (because of term ordering), if it is 0, its like now (no payloads/prositions); if the first visible shift>0, payloads/positions were used and the numbero of bits there is also known. |
Uwe Schindler (@uschindler) (migrated from JIRA)
It is mentioned in the docs, that this class is for indexing only: * <p><b>Please note:</b> This class is only used during indexing. You can also create
* numeric stored fields with it, but when retrieving the stored field value
* from a {`@link` Document} instance after search, you will get a conventional
* {`@link` Fieldable} instance where the numeric values are returned as {`@link` String}s
* (according to <code>toString(value)</code> of the used data type). In my opinion: Storing this info in the segments is not doable without pitfalls: If somebody indexes a normal field name in one IndexWriter session and starts to index using NumericFiled in the next session, he would have two segments with different encoding and two different "flags". When these two segments are merged later, what do with the flag? If we want to have such Schemas, they must be index wide and we have no possibility in Lucene for that at the moment. If somebody creates a schema, that can do this (by storing the schema in a separate file next to the segments file), we can think about it again (with all problems, like: MultiReader on top of two indexes with different schemas - forbid that because schema different?). All this says me, we should not do this, it is the task of Solr, my own project panFMP, or Earwin's own schema, to enforce it. |
Michael McCandless (@mikemccand) (migrated from JIRA)
The proposal was not to store it into the segments file (which I agree it has serious problems, since it's global). I had considered FieldInfos (which is "roughly" Lucene's "schema", per segment), but that too has clear problems. My proposal was the flags per-field stored in the fdt file. In that file, we are already writing one byte's worth of flags (only 3 of the bits are used now), for every stored field instance. This is in FieldsWriter.java ~ line 181. The flags now record whether each specific field instance was tokenized, compressed, binary. FieldsReader then uses these flags to reconstruct the Field instances when building the document. This bits are never merged; they are copied (because they apply to that one field instance, in that one document). My proposal was to add another flag bit (numeric) and make use of that to return a NumericField instance when you get your document back. It would have no impact to the index size, since we still have 5 free bits to use. But, it is technically a (one bit) change to the index format, which people seriously objected to. So net/net I'm OK going forward without it. |
Uwe Schindler (@uschindler) (migrated from JIRA) Here an updated patch:
The patch (and the one before) checks the autodetection two times (so you see in the trie tests, how one would use SortField with trie), what do you think about all this?
I agree, and for the reasons noted before, I would like to see NumericField as only a helper for indexing. Storing the number as plain-text string (as done in the patch) does not justify a NumericField on getting stored fields. When Yonik committed #2773, I would do some additional NumericField fine tuning, but the patch is finished now. |
Michael McCandless (@mikemccand) (migrated from JIRA) Uwe, with your patch, it looks like if I ask for eg doubles w/ parser=null, and then ask again w/ parser=FieldCache.DEFAULT_DOUBLE_PARSER, I get double entries stored? Is there some way to take the auto-detected parser and use it (not null) in the cache key? |
Uwe Schindler (@uschindler) (migrated from JIRA)
This is correct, the cache key is different for NULL vs. explicit parser, because the result may be different (which is unlikely the case) but they are two different things. When you ask for an auto-cache (we should deprecate the AUTO-part in field cache, too! It is not used anymore with new sorting code, even when SortField.AUTO is enabled), you also get different cache keys! |
Michael McCandless (@mikemccand) (migrated from JIRA) But, with the new public exposure of the field cache parsers, this is a newly added trap? You could accidentally consume 2X the RAM. Since auto-detection of the parser simply means a specific parser was chosen, why can't we then cache using that chosen parser? Then you would not risk 2X memory usage. Or, maybe we should leave the parsers private to not have this risk. |
Uwe Schindler (@uschindler) (migrated from JIRA) Not sure! And it is a trap for the BYTE and SHORT caches, too! But for sure, the old, never used auto-cache should be deprecated, too! |
Uwe Schindler (@uschindler) (migrated from JIRA) Here a new patch:
|
Michael McCandless (@mikemccand) (migrated from JIRA) The last patch looks great Uwe! I think we're nearly done here:
|
Uwe Schindler (@uschindler) (migrated from JIRA)
For the whole TrieAPI or only NumericUtils? If the first, I would do this with an general JavaDoc commit after this issue. If only NumericField, I could do it now.
Sure, we had this the last time, too (I like my variant more, so I always automatically write it in that way)
I will open an issue because of this byte/short trie fields (#2784) NumberTools does not handle zero-padding, so it could stay deprecated. Numbers encoded with NumberTools cannot be natively sorted on at all (only using StringIndex) and can only handle longs. You may mean NumberUtils from Solr in contrib/spatial, but this class is not yet released and is only used for spatial.
Will come. |
Michael McCandless (@mikemccand) (migrated from JIRA)
For the whole thing; I think an added NOTE at each class level
Which "last time"? Is there somewhere in the code base now where We generally try (though, don't always succeed) to follow Sun's coding
Actually, I believe it does do 0 padding and handles negative numbers Ie, I can take a short now, call longToString, index with that, do
OK but since we've marked it 3.1 (which I think is OK; though in |
Uwe Schindler (@uschindler) (migrated from JIRA)
This was not against the change. With "last time" I meant that some time ago you mentioned the same in a different patch from me. I will change it. My note was only, that I "automatically" create such code, because I for myself find its better readable. That was all :-)
You cannot do this with NumberTools. NumberTools uses an special radix 36 encoding (and not radix 10 like normal numbers). The encoding is just like NumericUtils not human-readable and so cannot be parsed with Number.toString(). To convert back, you need the method from the same class. Because of this you have two possilities: Write your own parser and pass it to SortField/FieldCache or sort using StringIndex (because it is sortable according to String.compareTo). So it can be deprecated. If sombody want to do encoding and parsing with FieldCache.getShorts() there is no way around a DecimalFormat with zero-padding and the problem with negative numbers. |
Michael McCandless (@mikemccand) (migrated from JIRA)
Whoa! Sorry, you are right. Nothing in FieldCache can handle decoding Though it does allow you to do RangeQuery, with short/byte, and you So I now agree: we should deprecate NumberTools entirely, now. If people ask how to handle short/byte beforew we resolve #2784, |
Uwe Schindler (@uschindler) (migrated from JIRA) Attached is a patch wil the latest changes:
|
Michael McCandless (@mikemccand) (migrated from JIRA) Uwe, can we give a default (4?) for precisionStep, when creating a NumericField, NumericRangeFilter/Query? |
Uwe Schindler (@uschindler) (migrated from JIRA)
Can you open an issue? There are some problems with defining a good default. In my environment, 8 makes the best results, 4 is only little faster.
So I do not want to set a default with enough tests from different people/scenarios, and this comes when 2.9 is out and everybody tries out :( I will commit this patch in a day or two after applying LUCENE-1701-test-tag-special.patch to 2.4-test-branch. |
Earwin Burrfoot (migrated from JIRA) Using 4 for int, 6 for long. Dates-as-longs look a bit sad on 8. Though, if you want really fast dates, chosing hour/day/month/year as precision steps is vastly superior, plus it also clicks well with user-selected ranges. Still, I dumped this approach for uniformity and clarity. |
Uwe Schindler (@uschindler) (migrated from JIRA)
I think 4 for ints is a good start, better as 4 for longs (which produces 16 different precision terms and upto 31 term enums [= precision changes] per range). 6 is a good idea, it brings a little bit more than 8 but does not produce too much precision changes. I tested that also with my 2 M numeric-only index here. Mike: As you see, the precision step is a good config approach, so an default is should be choosen carefully.
That is clear. Because these precisions are fitting exact to users queries in case of dates (often users take full days when selecting the range). Nice to hear, that you use TrieRange? What is your index spec and measured query speeds (if it does not go too far into company internals)? |
Earwin Burrfoot (migrated from JIRA) >>> Design for today. >> NRT seems to tread the same path, and I'm not sure it's going to win that much turnaround time after newly-introduced per-segment collection. > But, not having to write & read deletes to disk, not commit (fsync) > And this integration lets us take it a step further with #2390, >> Some time ago I finished a first version of IR plugins, and enjoy pretty low reopen times (field/facet/filter cache warmups included). (Yes, I'm going to open an issue for plugins once they stabilize enough) |
Michael McCandless (@mikemccand) (migrated from JIRA)
Unfortunately we can't easily conditionalize the default by int vs long. Ie use you NumericField like this: NumericField f = new NumericField("price", 4);
f.setFloatValue(15.50);
Agreed! But, it need not be "perfect". Advanced users can test & iterate to find the best tradeoff for their particular field's value distribution. For slow ranges now with RangeQuery (because of many unique terms), NumericRangeQuery will be a massive speedup with eg a default of 4. New users shouldn't have to understand what precisionStep means, or anything about "what's under the hood", in order to use NumericField. I should be able to simply: new NumericField("price", 15.50); Erring towards more terms (and faster searches) is fine, I think, because in a "typical" index the text fields with dwarf any small added disk space (hence my proposal of 4 as the default precisionStep).
OK I'll open a new issue. |
Michael McCandless (@mikemccand) (migrated from JIRA) OK I opened spinoff issues #2786 (default for precisionStep) and #2787 (rename RangeQuery -> TextRangeQuery, or something). |
Michael McCandless (@mikemccand) (migrated from JIRA)
That's a good point, though if it's contrib and you're a contrib committer it lessens the challenge, but the challenge is still there....
No pain no gain?
The problem is I've seen fsync take a ridiculous amount of time; it's not very predictable. So I think we do need some way to not put fsync between the changes to the index and the ability to search those changes.
Exactly; that's what #2390 is doing (flush new segments to a RAMDir).
OK. |
Uwe Schindler (@uschindler) (migrated from JIRA) Final patch. |
Uwe Schindler (@uschindler) (migrated from JIRA) Commited backwards tests: revision 787714 Thanks Mike! |
In discussions about #2747, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (#2773) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
I and Yonik tend to use the factory for both, Mike tends to create the new classes.
Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (#2552) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).
Migrated from LUCENE-1701 by Uwe Schindler (@uschindler), resolved Jun 23 2009
Attachments: LUCENE-1701.patch (versions: 7), LUCENE-1701-test-tag-special.patch, NumericField.java
Linked issues:
The text was updated successfully, but these errors were encountered: