-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Document and Field API [LUCENE-1597] #2671
Comments
Michael Busch (migrated from JIRA) You should start with looking at newdoc/demo/DocumentProducer.java. This class shows how a user of Lucene would add documents to a Lucene index with the new API. |
Yonik Seeley (@yonik) (migrated from JIRA) Separating FieldDescriptor and FieldValue sounds interesting... but I don't see the need for DocumentDescriptor, or the need to set it on the IndexWriter (and then have to have the distinction between fixed and variable fields). What about something along the lines of class Field {
FieldDescriptor descriptor;
String fieldName; // or alternately, the descriptor could contain the name
FieldValue[] fieldValues;
float boost;
}
class InputDocument {
Map<String fieldName, Field> OR List<Field> fields;
} |
Michael McCandless (@mikemccand) (migrated from JIRA) This looks great! Many random thoughts... This is largely a cleaner restructuring of what's already held in It's also quite different from Lucy/KS's approach which is to use This approach subdivides a type into N fully orthogonal attributes, so This can sometimes be awkward because attributes are "flat", eg How would you turn on/off [future] CSF storage? A separate attr? A NumericFieldAttribute seems awkward (one shouldn't have to turn on/off Presumably would could make an "iterate over all fields" utility so In this model, can one re-use FieldValue for maximizing indexing StoredFieldsWriter is needing to do instanceof checks & casting, It'd be great to land this before 2.9 (and cut back to Java 1.4) but Should we make "get me your TokenStream" (get/setAnalyzer) a part of Can a single FieldDescriptor be shared among many fields? Seems like Also how would we correspondingly fix FieldInfos to "generically" One thing I like about DocumentDescriptor is it can be the basis for Can we maybe rename Descriptor -> Type? Eg FieldDescriptor -> |
Michael Busch (migrated from JIRA) Thanks for the thorough review, Mike. Reading your response made me really excited, because you exactly understood most of the thoughts I put into this code, without me even mentioning them :) Thanks for writing them down! I started including your suggestions into my patch and will reply with more detail to your individual points as I'm working on them. |
Michael Busch (migrated from JIRA)
I was thinking about adding a separate attribute. But here is one So maybe we should state in our javadocs that a reader must support Btw. the same is true for fields that provide the data as a |
Michael Busch (migrated from JIRA)
Done.
I agree, this should be possible. I removed the name.
Yep I agree. Some things in this prototype are quite goofy, because I |
Steven Rowe (@sarowe) (migrated from JIRA) Can this be resolved (maybe as duplicate?), since Or maybe there are other not-already-implemented ideas here that could be refactored to work with the current status quo? (I didn't study the patch.) |
Michael McCandless (@mikemccand) (migrated from JIRA) I think it's more or less dup'd w/ #3384 ... we can open new issues for any differences. |
Uwe Schindler (@uschindler) (migrated from JIRA) Closed after release. |
This is a super rough prototype of how a new document API could look like. It's basically what I came up with during a long flight across the Atlantic :)
It is not integrated with anything yet (like IndexWriter, DocumentsWriter, etc.) and heavily uses Java 1.5 features, such as generics and annotations.
The general idea sounds similar to what Marvin is doing in KS, which I found out by reading Mike's comments on #1906, I haven't looked at the KS API myself yet.
Main ideas:
Again, this is not a "real" patch, but rather a demo of how a new API could roughly work.
Migrated from LUCENE-1597 by Michael Busch, resolved Sep 19 2012
Attachments: lucene-new-doc-api.patch
The text was updated successfully, but these errors were encountered: