Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support updateDocument() with DWPTs [LUCENE-2956] #4030

Closed
asfimport opened this issue Mar 9, 2011 · 17 comments
Closed

Support updateDocument() with DWPTs [LUCENE-2956] #4030

asfimport opened this issue Mar 9, 2011 · 17 comments

Comments

@asfimport
Copy link

asfimport commented Mar 9, 2011

With separate DocumentsWriterPerThreads (DWPT) it can currently happen that the delete part of an updateDocument() is flushed and committed separately from the corresponding new document.

We need to make sure that updateDocument() is always an atomic operation from a IW.commit() and IW.getReader() perspective. See #3400 for more details.


Migrated from LUCENE-2956 by Michael Busch, resolved Apr 14 2011
Attachments: LUCENE-2956.patch (versions: 2)
Linked issues:

@asfimport
Copy link
Author

Michael Busch (migrated from JIRA)

An idea from Mike how to fix this problem:

To avoid the full-stop, I think during the flush we can have two global delete pools. We carefully sweep all DWPTs and flush each, in succession. Any DWPT not yet flushed is free to continue indexing as normal, putting deletes into the first global pool, flushing as normal. But, a DWPT that has been flushed by the "sweeper" must instead put deletes for an updateDocument carefully into the 2nd pool, and not buffer the delete into DWPTs not yet flushed.

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

What is the status of this one? If no one's working on it, I can take a stab.

@asfimport
Copy link
Author

Simon Willnauer (@s1monw) (migrated from JIRA)

Jason I am working on this. First patch is not too far away.

@asfimport
Copy link
Author

Simon Willnauer (@s1monw) (migrated from JIRA)

FYI I have a working patch for this. it needs some cleanup so I will hopefully upload beginning next week....

@asfimport
Copy link
Author

Simon Willnauer (@s1monw) (migrated from JIRA)

Attaching an initial patch. This patch uses a entirely non-blocking approach to deletes based on a specialized linked list that only uses CAS operations.

Since this issue is quiet complex I tried to add as many useful comments as possible inline in the patch to make reviewing easier. So for details check out the patch.

All test on realtime branch pass with this patch. (once in a while I have a failure in the healthiness test but the assumptions in that test seem to be too strict and I need to fix that separately)

Reviews are very very much appreciated!

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

I think I have an idea, however can you explain the ticketQueue?

@asfimport
Copy link
Author

Simon Willnauer (@s1monw) (migrated from JIRA)

I think I have an idea, however can you explain the ticketQueue?

Sure, since with DWPT the flush process is concurrent and several DWPT could flush at the same time we must maintain the order of the flushes before we can apply the flushed segment and the frozen global
deletes it is buffering. The reason for this is that the global deletes mark a certain point in time where we took a DWPT out of rotation and freeze the global deletes.

Example: A DWPT 'A' starts flushing and freezes the global deletes, then DWPT 'B' starts flushing and freezes all deletes occurred since 'A' has started. if 'B' finishes before 'A' we need to wait until 'A' is done otherwise the deletes frozen by 'B' are not applied to 'A' and we might miss to deletes documents in 'A'.

The Ticket queue simply ensures that we push the frozen deletes and the new created segment in the same order as the corresponding DWPT have started flushing. If a DWPT finishes flushing we update its Ticket and then check the head of the queue if we can remove / publish the ticket. if so we continue publishing until the head of the queue can not be published yet or the queue is empty.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch looks great Simon!

Very impressive how this patch makes delete handling fully lockless!
The inline comments are very helpful...

Some small stuff:

  • I think we do need delete-by-Term[] (and Query[]) to be atomic?
    (IW has these methods, and I think they are atomic now?) This is
    from the TODOs in DocumentsWriterDeleteQueue...

  • Does TestRollingUpdates really need to use LogMP?

  • raceing -> racing; {@link BuffereDeletes} -> {@link
    BufferedDeletes}; DocumensWriter -> DocumentsWriter

@asfimport
Copy link
Author

Michael Busch (migrated from JIRA)

Cool patch! :)
Though it worries me a little how complex the whole delete/update logic is becoming (not only the part this patch adds).

Originally we decided to not go with sequenceIDs partly because we thought the implementation might be too complex, but I think it'd be simpler than the current approach that uses bits.

The sequenceIDs approach we had in the beginning was also completely lockless in a very very simple way.

Anyway, I have yet to take a closer look here. Just something that might be worth discussing.

@asfimport
Copy link
Author

Simon Willnauer (@s1monw) (migrated from JIRA)

Though it worries me a little how complex the whole delete/update logic is becoming (not only the part this patch adds).

I can not more agree. Its been very complex making all the tests pass and figuring out all the little nifty cornercases here. A different, somewhat simpler approach would be great. Eventually for Searchable Ram Buffers we might need to switch to seq. ids anyway but I think for landing DWPT on trunk we can go with the current approach.
I will update the latest patch and commit it to the branch and merge with trunk again. Once that is done I will setup a hudson build for RT so we give it a little exercise while we prepare moving to trunk.

@asfimport
Copy link
Author

Simon Willnauer (@s1monw) (migrated from JIRA)

here is an updated patch fixing some spellings, adds atomic updates for Term[] and Query[] and removes the LogMergePolicy restriction from TestRollingUpdates

@asfimport
Copy link
Author

Simon Willnauer (@s1monw) (migrated from JIRA)

I committed that patch and merged with trunk

@asfimport
Copy link
Author

asfimport commented Apr 13, 2011

Jason Rutherglen (migrated from JIRA)

Simon, nice work. I agree with Michael B. that the deletes are super complex. We had discussed using sequence ids for all segments (not just the RT enabled DWPT ones) however we never worked out a specification, eg, for things like wrap around if a primitive short[] was used.

Shall we start again on #3388? I think we still need/want to use sequence ids there. The RT DWPTs shouldn't have so many documents that using a long[] for the sequence ids is too RAM consuming?

@asfimport
Copy link
Author

asfimport commented Apr 14, 2011

Simon Willnauer (@s1monw) (migrated from JIRA)

Shall we start again on #3388? I think we still need/want to use sequence ids there. The RT DWPTs shouldn't have so many documents that using a long[] for the sequence ids is too RAM consuming?

Jason I think nothing prevents you from start working on this again.... Yet, I think we should freeze the branch now and only allow merging, bug fixes, tests and documentation fixes until we land on trunk. Once we are there we can freely push stuff in the branch again and make it work with seq. ids.

thoughts?

@asfimport
Copy link
Author

Simon Willnauer (@s1monw) (migrated from JIRA)

committed to branch

@asfimport
Copy link
Author

asfimport commented Apr 14, 2011

Jason Rutherglen (migrated from JIRA)

Jason I think nothing prevents you from start working on this again.... Yet, I think we should freeze the branch now and only allow merging, bug fixes, tests and documentation fixes until we land on trunk. Once we are there we can freely push stuff in the branch again and make it work with seq. ids.

OK, great. I remember now that our main concern was the memory usage of using a short[] (for the seq ids) if the total number of documents is numerous (eg, 10s of millions). Also at some point we'd have double the memory usage when we roll over to the next set, until the previous readers are closed.

I think we should freeze the branch now and only allow merging, bug fixes, tests and documentation fixes until we land on trunk

Maybe once #3388 sequence ids work for deletes, we can look at creating a separate branch that implements seq id deletes for all segments, and compare with the BV approach. Eg, performance, memory usage, and simplicity.

@asfimport
Copy link
Author

asfimport commented Apr 14, 2011

Simon Willnauer (@s1monw) (migrated from JIRA)

Maybe once #3388 sequence ids work for deletes, we can look at creating a separate branch that implements seq id deletes for all segments, and compare with the BV approach. Eg, performance, memory usage, and simplicity.

I don't think we need to create a different branch until then DWPT will be no trunk and we can simply compare to trunk, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment