-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] improve searching under high concurrancy [LUCENE-1337] #2414
Comments
Brian Gardner (migrated from JIRA) This patch applies to version 2.3.1 |
Yonik Seeley (@yonik) (migrated from JIRA) Thanks Brian, also see #1828 for more history and a bunch of options. |
Michael McCandless (@mikemccand) (migrated from JIRA) Duplicate of #1828. |
Jason Rutherglen (migrated from JIRA) The problem is the same but the solution is not. Do they each need separate patches listing more specifically how they solved the problem? Each solution has pluses and minuses. The NIOFSDirectory doesn't work on Windows. DescriptorsFSDirectory will on many Lucene installations quickly max out the file descriptors. I would like to see both committed to trunk. MMapDirectory is in the trunk and it has limitations as well, mainly that (at least how I understand it) loads the all the files into ram. |
Michael McCandless (@mikemccand) (migrated from JIRA) Jason are you thinking of #1492 (NIOFSDirectory)? |
Jason Rutherglen (migrated from JIRA) Yonik checked in a modification of FSDirectory into #1828. I took that code and made NIOFSDirectory which is standalone so that it can be committed. It is checked into #1828 as lucene-753.patch. |
Michael McCandless (@mikemccand) (migrated from JIRA)
OK. I think ? it's a good idea to separately offer an FSDirectory implementation that uses positional reads (via FileChannel) to avoid synchronization. I'd also like to somehow make that implementation the default on those platforms (all except windows?) where there are clear concurrency gains. Ie, maybe change FSDirectory.getDirectory to return NIOFSDirectory if it's not on windows, but also offer a getDirectory that takes the IMPL so you can force it to pick a different IMPL. In general I think Lucene should default to good out of the box performance, ie, without requiring special knowledge/tuning on the user's part, so long as there's no difficult tradeoff. Though we probably should change the name to something less generic than "nio", though I can't think of an alternative offhand. But one question: it looks like NIOFSIndexInput copies most of BufferedIndexInput source rather than subclassing – why was that? Can we change that back to a subclass, perhaps opening up members of BufferedIndexInput a bit if necessary? |
I was trying to load test my web server and kept running into a condition were the web server would become unresponsive even though the load was below one. Turns out Lucene has synchronization blocks around reading the index. It appears this was only necassary to synchronize access to a descriptor which contains a RandomAccessFile and information about the state of this file. My solution was to use a pool of descriptors so that they could be reused on subsequent reads. During periods of low contention only one or a few Descriptors will be created, but under heavy loads many Descriptors can be created to avoid synchronization. After creating and applying my patch, I was able to triple my searching throughput and fully utilize the resources, the CPU's becoming the new bottleneck. My patch modifies FSDirectory directly, but I'm not entirely sure that's the proper implementation. I'd like to help resolve this synchronization issue for other lucene users, so please let me know how I can help.
Migrated from LUCENE-1337 by Brian Gardner, resolved Jul 17 2008
Environment:
Attachments: lucene.patch
The text was updated successfully, but these errors were encountered: