New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Reduce Fieldable, AbstractField and Field complexity [LUCENE-2310] #3386

Closed

asfimport opened this issue Mar 11, 2010 · 49 comments

Labels

legacy-jira-fix-version:4.0-ALPHA legacy-jira-priority:Major legacy-jira-resolution:Fixed module:core/index type:task

asfimport commented Mar 11, 2010 •

edited

Loading

In order to move field type like functionality into its own class, we really need to try to tackle the hierarchy of Fieldable, AbstractField and Field. Currently AbstractField depends on Field, and does not provide much more functionality that storing fields, most of which are being moved over to FieldType. Therefore it seems ideal to try to deprecate AbstractField (and possible Fieldable), moving much of the functionality into Field and FieldType.

Migrated from LUCENE-2310 by Chris Male, resolved Mar 20 2012
Parent: #3384
Attachments: LUCENE-2310.patch, LUCENE-2310-Deprecate-AbstractField.patch (versions: 3), LUCENE-2310-Deprecate-AbstractField-CleanField.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch (versions: 2), LUCENE-2310-Deprecate-DocumentGetFields-core.patch

Author

asfimport commented Mar 11, 2010

Michael McCandless (@mikemccand) (migrated from JIRA)

+1

Author

asfimport commented Mar 12, 2010

Chris Male (migrated from JIRA)

The challenge presented in this work is the pervasiveness of the Fieldable class. Its used in several hundred places through the source, but the majority are in tests, and in Document itself. Therefore part of this work will be also to move as many of the tests over to using Field, and working on the Document API as well.

Author

asfimport commented Mar 13, 2010

Chris Male (migrated from JIRA)

Attaching first version of the patch which deprecates AbstractField.

Moves the properties and getters/setters down into Field.
Field now only implements Fieldable
Field now allows its value to be set to null through its construction. This allows subclasses to set the fieldData to their own
NumericField now extends Field, overridding the setValue methods as they are not supported
LazyField also now extends Field
AbstractField is now no longer used anywhere.

Author

asfimport commented Mar 13, 2010

Uwe Schindler (@uschindler) (migrated from JIRA)

You should also not be able to set the TokenStream in NF.

IMO, i would keep AbstractField and only remove Fieldable, as interfaces are not wanted in Lucene.

-1 for this patch in its current form.

Author

asfimport commented Mar 13, 2010

Chris Male (migrated from JIRA)

You should also not be able to set the TokenStream in NF.

Yes good point.

IMO, i would keep AbstractField and only remove Fieldable, as interfaces are not wanted in Lucene

Actually I would like to remove both actually. There doesn't seem much reason to keep AbstractField, especially since its already dependent on Field.XYZ and seems only to only store all the various properties, most of which will be moved away to FieldType anyway.

Would a compromise be to also add an UOE to setting the TokenStream in NumericField? It does still have the concept of a TokenStream, so it is a Field, but a specialisation which handles the TokenStream itself.

Author

asfimport commented Mar 13, 2010

Tim Smith (migrated from JIRA)

Personally, i like keeping Fieldable, (or having AbstractField just with abstract methods and no actual implementation)

for feeding documents, i use custom Fieldable implementations to reduce amount of setters called, as Fields of different types have different constant settings

Author

asfimport commented Mar 13, 2010

Earwin Burrfoot (migrated from JIRA)

These settings will go to FieldType?

Author

asfimport commented Mar 13, 2010

Chris Male (migrated from JIRA)

Hi Tim,

Yeah I see what you are saying, but as Earwin says, the 'settings' will be pushed into the FieldType, so they'll be removed from Fieldable as well.

Author

asfimport commented Mar 13, 2010

Chris Male (migrated from JIRA)

I should note, to prevent confusion, that my patch is just the beginning of this work, designed to illustrate the direction I'm heading.

Author

asfimport commented Mar 13, 2010

Chris Male (migrated from JIRA)

Addressed the issues raised by Uwe about the TokenStream in NumericField. NumericField now throws a UOE on setTokenStream. Since it also extends Field which has its own TokenStream field, NumericField now uses the field from TokenStream rather than its own.

The more and more this is discussed the clearer it is that Field should be the base class of the Field hierarchy, and not AbstractField or Fieldable. The issue of having all the setters and configurations will be addressed in LUENE-2308 when we move them all to FieldType. Field will become a simple tuple consisting of at least a value and type, and possibly a TokenStream.

NumericField and LazyField are customisations of Field controlling certain aspects of the tuple. For NumericField that is the TokenStream and setting the value. For LazyField that is the value.

Author

asfimport commented Mar 13, 2010

Uwe Schindler (@uschindler) (migrated from JIRA)

There is one problem in backwards:
If somebody has the following code:

AbstractField field = new Field(...)

This will no longer work.

Author

asfimport commented Mar 13, 2010

Chris Male (migrated from JIRA)

Addressed Uwe's issue again.

Only solution is to change Field to extend AbstractField again, even though AbstractField is dead code.

Also fixed:

Added final to setter methods that are also final in AbstractField for consistency sake
Fixed import for javadocs in CheckIndex and FieldsReader

Author

asfimport commented Mar 14, 2010

Chris Male (migrated from JIRA)

Spent more time pondering how to deprecate Fieldable and the major issue is Document#getFields(), which returns a modifiable List<Fieldable>. Because it is modifiable, consumers can add to it directly rather than through Document.

If it were unmodifiable, then it would be possible to control adding Fieldables in Document, which would then allow us to wrap Fieldable instances in a Field subclass, meaning Document would only have a List<Field>.

Given this limitation, I'm currently thinking about not trying to deprecate Fieldable in 3.x, but instead adding the unmodifiable list method and deprecating #getFields(). I would also add some functionality for removing Fieldables, which seems to be all thats lack in Document. Then in 4.x I would deprecate Fieldable.

Slow process, but I think by deprecating AbstractField now we have already made a step forward to improving this hierarchy in preparation for the FieldType classes.

Remaining in this work is a code cleanup of all 3 classes, so that code is understandable when we add in FieldType.

Author

asfimport commented Mar 14, 2010

Shai Erera (@shaie) (migrated from JIRA)

How about if getFields() will return an Iterable<Field>? It'll need to be deprecated and a new method created, but it can be simple enough solution to avoid one adding fields directly to Document w/o passing through doc.add first.

If you do end up removing getFields(), then I'd like to have a clear() on Document. I once proposed it and was told to call getFields().clear() instead, which allows me to reuse my Document instance. If I lose that option, I'd appreciate if a direct clear() on Document will exist instead.

Author

asfimport commented Mar 14, 2010

Chris Male (migrated from JIRA)

Hi Shai,

How about if getFields() will return an Iterable<Field>? It'll need to be deprecated and a new method created, but it can be simple enough solution to avoid one adding fields directly to Document w/o passing through doc.add first.

I'm not really in favour of Iterable in this case as it inhibits people calling .size(). Knowing how many fields a Document has might be useful. But I agree it would certainly prevent people from adding new Fields and is a simple solution. Do you think getting the number of fields has a use case at all?

If you do end up removing getFields(), then I'd like to have a clear() on Document. I once proposed it and was told to call getFields().clear() instead, which allows me to reuse my Document instance. If I lose that option, I'd appreciate if a direct clear() on Document will exist instead.

Yes. I will certainly add clear(). Do you think its useful to support removing specific fields as well i.e. remove(String fieldName)?

Author

asfimport commented Mar 14, 2010

Shai Erera (@shaie) (migrated from JIRA)

I can't think of why one would want to know how many fields a Document has. But even then, we can add a numFields() method to Document. The Iterable thingy will just prevent a lot of unnecessayy work I think.

About remove(String fieldName), I don't see a case for that as well. A Document is a point in time object - you parse a document, extract metadata, converts the entire flow to a Lucene object (in this case Document) and passes it along to IW. You then move on to the next document.

So I wouldn't know why would someone want to remove a field from a Document ... but perhaps I just haven't run into the use case yet. Iterable btw will at least alert the user if he did that, because a ConcurrentModificationException will be thrown (if we're using a List-based iterator) ...

Author

asfimport commented Mar 14, 2010

Chris Male (migrated from JIRA)

Haha, i whole heartedly agree with all your points.

One option instead of adding another method to return Iterable<Fieldable>, is for Document to implement Iterable<Fieldable> and to return a unmodifiable Iterator.

This would then fit nicely with future ideas I had about providing iterators based on FieldType criteria, allowing the IW to then retrieve an iterator of only those fields which are to be indexed for example.

Author

asfimport commented Mar 16, 2010

Shai Erera (@shaie) (migrated from JIRA)

i like the idea of Document to implement Iterable, but how does that solve the case where someone wants to query how many fields a document has? Will you still have getFields(), only now it will return an unmodifiable collection?

I guess the unmod collection can be returned even today, right?

BTW, what happens if getFields() return an unmod collection, but someone calls doc.add(Field)? I think the unmod collection prevents you from adding to that collection wrapper, but not for that collection to be changed from under the hood? If that's true, then that could cause some trouble ... so getFields() will really return a snapshot of Document, which means we need to clone Fields ...

Gets too complicated no? Maybe just do: (1) Doc implements Iterable and (2) Doc exposes numFIelds(), add(Field)?

About remove(field), I thought of a possible scenario though I still don't think it's interesting enough - suppose that you pass your Document through a processing pipeline/chain, each handler adds fields as metadata to the Document. For example, annotators. It might be that a field A exists, only for a handler down the chain to understand A's meaning and then replace it w/ A1 and A2. For that you'll want to be able to move a field ... I guess we could add a remove method to Document, and if it'll be called while the fields are iterated on, a CME will be thrown, which is perfectly fine with me.

Author

asfimport commented Mar 16, 2010

Chris Male (migrated from JIRA)

Hi Shai,

i like the idea of Document to implement Iterable, but how does that solve the case where someone wants to query how many fields a document has?

It doesn't, but then I'd add a numFields() method maybe. It seems like something with a small use case and so having it has a method on the side seems ideal.

Will you still have getFields(), only now it will return an unmodifiable collection?

Yes and no. getFields will remain but with a modifiable list. I will then deprecate the method and recommend people use the Iterable. This gives everybody a chance to migrate during the 3.x versions.

Maybe just do: (1) Doc implements Iterable and (2) Doc exposes numFIelds(), add(Field)?

Yup lets do that. Unfortunately getFields will remain until 4.0.

About remove(field), I thought of a possible scenario though I still don't think it's interesting enough - suppose that you pass your Document through a processing pipeline/chain, each handler adds fields as metadata to the Document. For example, annotators. It might be that a field A exists, only for a handler down the chain to understand A's meaning and then replace it w/ A1 and A2. For that you'll want to be able to move a field ... I guess we could add a remove method to Document, and if it'll be called while the fields are iterated on, a CME will be thrown, which is perfectly fine with me.

With the idea of having remove(...) I am trying to foresee what people might be doing via getFields() and thus are not going to be able to do when its gone. We will have the ability to add and iterate, so having the functionality to remove seems to complete it. I completely agree that if something happens and a CME is thrown, then that problem should be left to the user.

I think this clarifies this direction. Document will be changed as follows:

Document will become Iterable<Fieldable>
getFields() will be deprecated in favour of the Iterable
numFields() will be added returning the number of fields
remove(String) will be added allowing someone to remove Fields with the given name. If a CME occurs, thats up to the user to handle.

Cheers Shai!

Author

asfimport commented Mar 16, 2010

Shai Erera (@shaie) (migrated from JIRA)

I'm sorry for the confusion - I got used to all the deprecation discussions so much that it's embedded in my replies :) - when I wrote "instead getFields" I meant that it will be deprecated, and we'll carry it w/ us until 4.0 is out.

So overall we agree on the changes that need to be made. BTW, when you deprecate a method, you usually change it to call the new API or change it to use the new data structures or whatever. So we need to think how to impl getFields such that if one calls remove, numFields or use the iterator on an interleving manner, his code doesn't break ... I don't think it should be hard but it might be a good idea to even write such (deprecated) unit test

Author

asfimport commented Mar 16, 2010

Chris Male (migrated from JIRA)

So overall we agree on the changes that need to be made. BTW, when you deprecate a method, you usually change it to call the new API or change it to use the new data structures or whatever. So we need to think how to impl getFields such that if one calls remove, numFields or use the iterator on an interleving manner, his code doesn't break ... I don't think it should be hard but it might be a good idea to even write such (deprecated) unit test

I'm not sure we have to change getFields. We can just deprecate it, and point people to the new methods. I think it'd be more effort than its worth to create a List impl that calls the new methods. Was that what you were implying? I do agree its worth writing a test to ensure all old functionality can be done via the new methods somehow.

Author

asfimport commented Mar 16, 2010

Shai Erera (@shaie) (migrated from JIRA)

That was usually the approach. You provide new methods, deprecate old ones, however both work and not in a XOR mode. Both should work and we need to ensure that if people call both they still function properly. Unless this has changed, in which case it should be clearly documented.

But I don't think it is a big problem to support both? If Document still keeps its fields in a List then all should remain the same. We could have a 4.0 note to switch to a Map based DS to better support remove, but that's questionable because we'll need to maintain ordering on the fields (the order in which they inserted) though personally I don't think it should matter much to the user, however that's the current implementation.

Author

asfimport commented Mar 16, 2010

Chris Male (migrated from JIRA)

I recommend we keep it as a List since that facilitates having different iterators by FieldType criteria more. A Map would support get and remove better, but I think we want to move people to using Iterators and the remove method is there for a case we don't know of yet.

I'll create a patch with these ideas shortly.

Cheers!

Author

asfimport commented Mar 16, 2010

Shai Erera (@shaie) (migrated from JIRA)

I agree. Then keeping both deprecated and new API should be supported easily.

Author

asfimport commented Mar 20, 2010

Chris Male (migrated from JIRA)

New version of the AbstractField deprecation patch which cleans up Field so its more readable, deferring documentation to Fieldable where possible in preparation for deprecating most the methods in Fieldable.

Author

asfimport commented Mar 20, 2010

Chris Male (migrated from JIRA)

Attaching patch for changes to Document implementing above ideas discussed above

Document#getFields() is deprecated
Document now implements Iterable<Fieldable> and provides an iterator() method
Added a getNumFields() method for returning the number of fields in a Document

Having not paid attention earlier, there is already a removeFields method in Document, so it didn't need to be added.

There are some greater problems with Document related to this issue. It provides methods for both Fieldables and Fields, getField(String) for example, could throw a ClassCastException if used in combination with AbstractField. Because the overall goal of this work is to reduce these 3 classes to just Field in 4.x, I think we can leave the methods as most will be deprecated. However some documentation improvements might prevent any CCEs.

I'll spin off another issue later to do that once this stuff has been committed.

Author

asfimport commented Mar 21, 2010

Chris Male (migrated from JIRA)

Attaching patch that migrates core (including tests) away from Document#getFields().

Author

asfimport commented Mar 21, 2010

Uwe Schindler (@uschindler) (migrated from JIRA)

I would make the iterator() unmodifiable (without remove()), so map the list before with Collections.umodifiableList(). Any comments?

Author

asfimport commented Mar 21, 2010

Chris Male (migrated from JIRA)

Hi Uwe,

Yes you are right. That was one of the original motivations for this work! I will make the appropriate change now.

Author

asfimport commented Mar 21, 2010

Chris Male (migrated from JIRA)

Attached new version of Document change which makes the Iterator unmodifiable. Noted this fact in the javadoc for the iterator() method.

Author

asfimport commented Mar 21, 2010

Chris Male (migrated from JIRA)

The patch I just attached makes a test in InstantiatedIndex fail because it tries to remove a field through the iterator. This illustrates a concern I have with making the iterator unmodifiable, there is no efficient way to remove a field while iterating, without also running into a CCE. I will think over this issue a little bit before deciding whether to continue with the unmodifiable iterator or not.

Author

asfimport commented Mar 31, 2010

Michael McCandless (@mikemccand) (migrated from JIRA)

I wonder if, like we're thinking for analysis, we can substantially
reduce the Document/Field API that's consumed by the indexer....

Really all indexer needs to do is:

Iterate the indexed fields, for each getting a name, a TokenStream
(or possibly something more pared-back – don't need close nor
reset), and a FieldType containing indexing settings (omits
positions, omits term freq, etc.)
Iterate the stored fields, for each getting name and value (or,
possibly, only a serialize() method, instead?)

The iterators would be unmodifiable.

Indexer could care less how these are implemented.

Lucene would still provide a fully featured concrete Document/Field
classes that're app-friendly, ie malleable, with clear, add/remove
fields, etc. I think Document & Field can be fully concrete? Ie no
more AbstractField/Fieldable. It's simply a convenience class offered
to apps.

Indexer would not see that class. It would only see the above minimal
interface. So for expert apps that don't want to use Lucene's
convenient Document/Field, they can easily make their own impls of
that interface.

Author

asfimport commented Mar 31, 2010

Chris Male (migrated from JIRA)

+1

I think this is a great idea.

Having a clear separation from the 'core space' where the indexer sits, and the 'user space' gives us the freedom to address the Document/Fieldable class stack problems.

We could introduce an IndexableDocument abstraction (which Document would implement) which provides the stripped down API that the indexer needs for Documents, which is just a way to get the fields to index and store.

Given that the indexer need only know the name, type and value/tokenstream for a Field, we can visualize this as just a tuple. Therefore the IndexableDocument, rather than returning Fields, could return instances of a new Tuple class, which would be immutable. Field would then implement Tuple.

Document and Field would then be exactly as suggested, 'user space' app friendly mutable classes.

On the search side, the 'core space' wouldn't know of the idea of a Document, but rather a set of tuples. We would then allow different consumers of those tuples to be provided. One such consumer would use the tuples to build Documents and Fields again.

Author

asfimport commented Mar 31, 2010

Earwin Burrfoot (migrated from JIRA)

Yahoo! +1 for introducing barebones interface for the indexer.

Author

asfimport commented Mar 31, 2010

Shai Erera (@shaie) (migrated from JIRA)

+1 for this simplification. Can we just name it Indexable, and omit Document from it? That way, it's both shorter and less chances for users to directly link it w/ Document.

One thing I didn't understand though, is what will happen to ir/is.doc() method? Will those be deprecated in favor of some other class which receives an IR as parameter and knows how to re-construct Indexable(Document)?

Author

asfimport commented Mar 31, 2010

Chris Male (migrated from JIRA)

Can we just name it Indexable, and omit Document from it? That way, it's both shorter and less chances for users to directly link it w/ Document.

+1

One thing I didn't understand though, is what will happen to ir/is.doc() method? Will those be deprecated in favor of some other class which receives an IR as parameter and knows how to re-construct Indexable(Document)?

I imagine some sort of callback which the IR/IS call with the information they take from the index. One default implementation we could provide would re-construct Indexables, and one could re-construct Documents for the userspace.

Author

asfimport commented Mar 31, 2010

Earwin Burrfoot (migrated from JIRA)

Can't we keep interfaces for indexing and retrieval separate?
Isn't that the whole freaking point - having a minimal-area interface to index stuff, with possibly widely varying implementations. You can't reconstruct Indexable by definition, as it is user-defined and provided.
Your average back-compat Document may be one implementation, but I can have an implementation that just reads stuff from the socket and shoves it into the index, streaming-style - how're you going to reconstruct that?

Author

asfimport commented Mar 31, 2010

Shai Erera (@shaie) (migrated from JIRA)

Right Earwin - agreed.

I'd like to summarize a brief discussion we had on IRC around that:
The idea is not to provide another interface/class for search purposes, but rather expose the right API from IndexReader, even if it might be a bit low-level. API like getIndexedFields(docId) and getStorefFields(docId), both optionally take a FieldSelector, should allow the application to re-construct its Indexable however it wants. And IR/IS don't need to know anything about that.
To complete the picture for current users, we can have a static reconstruct() on Document which takes IR, docId and FieldSelector ...

BTW, I'm not even sure getIndedxedFields can be efficiently supported today. Just listing it here for completeness.

Author

asfimport commented Mar 31, 2010

Chris Male (migrated from JIRA)

Earwin,

Yeah definitely we want to keep things separate. I am infavour of Indexable being an indexing only construct, staying in the index part of the core space. However Document will be a user space construct that spans both indexing and searching since its API is designed to be flexible and easy to use and manipulate.

Writing implements of Indexable will be an expert thing I imagine, so if you index bytes taken from a socket, then I imagine you'll choose an implementation on the search side that can handle it. If your stream from a socket has some custom stuff, then you could write a custom consumer on the search side that builds whatever search construct you want.

Author

asfimport commented Mar 31, 2010

Uwe Schindler (@uschindler) (migrated from JIRA)

I am also +1 on the indexer interface.

I just repeat myself: We still need TokenStream, an AttributeSource alone is too less. But that is away from that issue: Indexable provides an iterator of fields that consist of name and TokenStream and some options (possibly like omitNorms). If you just dont want to have close() in TokenStream, let's remove it. end() is needed for offsets, the indexer need to take care. incrementToken() is the iterator approach. What else is there? Reset may be invisible to indexer (I would refactor that and would make a subclass of TokenStream that supports reset, ResetableTokenStream - like Tokenizer supports reset(Reader), which is also a subclass). The abstract TokenStream then is only consisting of incrementToken() and end() + the AttributeSource access methods. Attributes needed by indexer are only TermToBytesRefAttribute, PositionIncrementAtt, OffsetAttribute and PayloadAttribute.

Author

asfimport commented Mar 31, 2010

Chris Male (migrated from JIRA)

Hi Uwe,

I envisaged the Indexable providing Tuples of field name, field type, and TokenStream as it is today. So I think we shouldn't run into any problems?

Author

asfimport commented Mar 31, 2010

Uwe Schindler (@uschindler) (migrated from JIRA)

Yeah!

Author

asfimport commented Mar 23, 2011

Chris Male (migrated from JIRA)

Reactivating this issue.

New patch against 3x (since its all about deprecations). AbstractField and Fieldable are deprecated. Document is changed a little in preparation for using Field. Code cleanup in Field.

I'm going to leave the suggested larger changes to Document and Field for a later issue. I really want to get these classes out of trunk so I can then visualize the code better.

Author

asfimport commented Mar 23, 2011

Simon Willnauer (@s1monw) (migrated from JIRA)

Hey Chris,

good that you reactivate this issue! I was looking into similar stuff while working on docvalues since it really needs to add stuff to Field / Fieldable. With a cleanup and eventually FieldType this would be way less painless I guess. I have a couple of questions and comments to the current patch.
Btw. I like the fact that the previous patch was uploaded March 21 2010 and the latest took 1 year to come up on march 23 2011 :)

Why do you reformat all the stuff in Field, is that necessary here at all? I mean its needed eventually but for the deprecation of things it only bloats the patch really doesn't it?
When you deprecate AbstractField and Fieldable, Field should ideally be a standalone class. So I see that this still needs to subclass Fieldable / AbstractField but could it stand alone now so that we can simply remove the extends / implements on Field once we drop things in 4.0? I think it looks good from looking at the patch though
I don't like the name getAllFields on Document since it implies that we have a getPartialFields or something. I see that you can not use getFields since it only differs in return type which doesn't belong to the signature though. Maybe we should implement Iterable<Field> here and offer an additional method getFieldsAsList or maybe getFields(List<Field> fields)
once we have this in what are the next steps towards FieldType? Will we have only one class Field that is backed by a FieldType but still offers the methods it has now? Or doe we have two totally new classes FieldTyps and FieldValue, something like this:
```
 
class FieldValue {
  FieldType type;
  float boost;
  String name;
  Object value;
}
```
I wonder if this patch raises tons of deprecation warnings all over lucene where Fieldable was used? In IW we use it all over the place though. We must fix that in this issue too otherwise uwe will go mad I guess :)

thanks for bringing this up again!

Author

asfimport commented Mar 23, 2011

Chris Male (migrated from JIRA)

Thanks for taking a look at this Simon.

Why do you reformat all the stuff in Field, is that necessary here at all? I mean its needed eventually but for the deprecation of things it only bloats the patch really doesn't it?

Because for me this issue is about reducing the complexity of these classes and Field is a mess. Making it more readable reduces the complexity. If needs be I will do this in two patches, but I don't feel this issue is resolved till the code in Field is readable.

When you deprecate AbstractField and Fieldable, Field should ideally be a standalone class. So I see that this still needs to subclass Fieldable / AbstractField but could it stand alone now so that we can simply remove the extends / implements on Field once we drop things in 4.0? I think it looks good from looking at the patch though

I don't really understand what you're suggesting here. In 3x where the deprecations will be occurring Field has to continue to extend AbstractField. Yes in 4.0 we can drop that extension but addressing the deprecations is not in the scope of 3x.

I don't like the name getAllFields on Document since it implies that we have a getPartialFields or something. I see that you can not use getFields since it only differs in return type which doesn't belong to the signature though. Maybe we should implement Iterable<Field> here and offer an additional method getFieldsAsList or maybe getFields(List<Field> fields)

Yeah good call. I think implementing Iterable<Field> is best, but it will also require adding a count() method to Document since often people retrieve the List to get the number of fields.

once we have this in what are the next steps towards FieldType? Will we have only one class Field that is backed by a FieldType but still offers the methods it has now? Or doe we have two totally new classes FieldTyps and FieldValue

Once FieldType is in, all the various metadata properties (isIndexed, isStored etc) will be moved to FieldType, leaving Field as what you suggest as FieldValue. Field will contain its type, boost, name, value. If we have Analyzers on FieldTypes, then we will be able to remove the TokenStream from Field.

I wonder if this patch raises tons of deprecation warnings all over lucene where Fieldable was used? In IW we use it all over the place though. We must fix that in this issue too otherwise uwe will go mad I guess

Yeah but not in 3x unfortunately. As it stands people can retrieve the List of Fieldables via getFields() and add whatever implementation of Fieldable they like. Consequently we need to continue to support Fieldable in IW for example. Once this code has been committed I will create a new patch for trunk which moves all of Solr and Lucene over to the Field. I could do this in many places already of course, but that core classes like IW would have to remain as they are.

I will wait for your thoughts on the reformating and then make a new patch.

Author

asfimport commented Mar 23, 2011

Simon Willnauer (@s1monw) (migrated from JIRA)

I don't really understand what you're suggesting here. In 3x where the deprecations will be occurring Field has to continue to extend AbstractField. Yes in 4.0 we can drop that extension but addressing the deprecations is not in the scope of 3x.

What I mean here is that if I would simply remove the extends AbstractField from Field would it still compile or are there any dependencies from AbstractField? IMO AbstractField should just be empty now right?

Author

asfimport commented Mar 23, 2011

Chris Male (migrated from JIRA)

Yes Field would still compile if you removed the extends. However if we empty AbstractField then any client code that also extends AbstractField would break. Thats why I deprecate the whole class but leave its code in. We could empty it and change it to extend Field, I think that would still work.

Author

asfimport commented Mar 23, 2011

Simon Willnauer (@s1monw) (migrated from JIRA)

Yeah but not in 3x unfortunately. As it stands people can retrieve the List of Fieldables via getFields() and add whatever implementation of Fieldable they like. Consequently we need to continue to support Fieldable in IW for example. Once this code has been committed I will create a new patch for trunk which moves all of Solr and Lucene over to the Field. I could do this in many places already of course, but that core classes like IW would have to remain as they are.

So, what is the reason for doing this in 3.x at all, can't we simply drop stuff in 4.0 and let 3.x alone?

Simon

Author

asfimport commented Mar 23, 2011

Chris Male (migrated from JIRA)

So, what is the reason for doing this in 3.x at all, can't we simply drop stuff in 4.0 and let 3.x alone?

Very good question. Certainly we are simplifying the codebase and I feel that Field is what most users use (not AbstractField). But I know some expert users do use AbstractField. But maybe they can handle the hard change?

asfimport closed this as completed

This was referenced Aug 24, 2022

Separately specify a field's type [LUCENE-2308] #3384

Closed

First cut at column-stride fields (index values storage) [LUCENE-2186] #3262

Closed

NumericField should be stored in binary format in index (matching Solr's format) [LUCENE-3065] #4138

Closed

AbstractField should not have a tokenStream member variable, it is unused. each subclass should manage their own if needed [LUCENE-2546] #3620

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment