-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove shared doc stores [LUCENE-2555] #3629
Comments
Michael Busch (migrated from JIRA) What shall we do about index backward-compatibility? I guess 4.0 has to be able to read shared doc stores? So a lot of that code we can't remove? :( |
Jason Rutherglen (migrated from JIRA) Maybe we should break backwards-compatibility for the RT branch? Or just ship an RT specific JAR to keep things simple? |
Michael McCandless (@mikemccand) (migrated from JIRA) The reading side of shared doc stores is quite trivial; I think we should keep it (keep back compat). |
Shai Erera (@shaie) (migrated from JIRA) What are the performance implications of removing shared doc stores? From what I understand, if several segments share the same doc store, then when they are merged, the doc stores aren't merged. Which is a great benefit, especially if you intend to store large fields. I understand (mostly from the discussion on the PTDW) that with the move to a per-thread approach, the doc stores cannot be shared between segments created by different threads, but what about segments created by the same thread? Are we losing that functionality? |
Jason Rutherglen (migrated from JIRA) Shai, I think Mike has outlined the pros and cons in other Basically when going to DWPTs we're losing shared doc stores |
Michael Busch (migrated from JIRA)
We discussed that in #3400 (close to the bottom). The problem is that doc stores only help you if you merge segments that all share the same store. With DWPT that's extremely unlikely.
I agree we have to test this when this patch is complete. My hope is that we save in other places (removing the interleaving step of the per-thread postings, no wait queue that serializes writing to doc stores) so that overall we won't be slower. |
Michael McCandless (@mikemccand) (migrated from JIRA)
Also, remember that shared doc stores is not as good an opto as it used to be, because we are now able to bulk-copy both stored fields and term vectors during merging. However, bulk merging only happens if the field name -> number mapping is congruent, b/w the merged segment and the one segment being merged. Unfortunately, you can easily unexpectedly break this (see LUCENE-1737) but eg adding diff't fields to your docs, or adding same fields just in a different order. |
Shai Erera (@shaie) (migrated from JIRA) Thanks for the explanation. Let's remember though that not all apps are multi-threaded, but I think most are, so designing for the most is better than making the other few more performing. I'm generally ok with that, just wanted to better understand the reasons. |
Michael Busch (migrated from JIRA) Checkpointing what I have so far:
Some testcases fail still. Next I will look into removing PerDocBuffer, exception handling, |
Jason Rutherglen (migrated from JIRA) Michael, nice! A lot is cleaned up. |
Michael Busch (migrated from JIRA) Changed the patch to also remove PerDocBuffer. It changes StoredFieldsWriter and TermVectorsTermsWriter to write the data directly to the final IndexOutput, without buffering it in a temporary PerDocBuffer. Several tests still fail due to exception handling or thread-safety problems (which is expected - haven't tried very hard to fix them yet). I will commit this patch to the realtime branch and work on fixing the tests with a separate issue. |
Michael Busch (migrated from JIRA) Committed revision 978805. |
With per-thread DocumentsWriters sharing doc stores across segments doesn't make much sense anymore.
See also #3400.
Migrated from LUCENE-2555 by Michael Busch, resolved Jul 24 2010
Attachments: lucene-2555.patch (versions: 2)
The text was updated successfully, but these errors were encountered: