Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] improve searching under high concurrancy [LUCENE-1337] #2414

Closed
asfimport opened this issue Jul 16, 2008 · 7 comments
Closed

[PATCH] improve searching under high concurrancy [LUCENE-1337] #2414

asfimport opened this issue Jul 16, 2008 · 7 comments

Comments

@asfimport
Copy link

I was trying to load test my web server and kept running into a condition were the web server would become unresponsive even though the load was below one. Turns out Lucene has synchronization blocks around reading the index. It appears this was only necassary to synchronize access to a descriptor which contains a RandomAccessFile and information about the state of this file. My solution was to use a pool of descriptors so that they could be reused on subsequent reads. During periods of low contention only one or a few Descriptors will be created, but under heavy loads many Descriptors can be created to avoid synchronization. After creating and applying my patch, I was able to triple my searching throughput and fully utilize the resources, the CPU's becoming the new bottleneck. My patch modifies FSDirectory directly, but I'm not entirely sure that's the proper implementation. I'd like to help resolve this synchronization issue for other lucene users, so please let me know how I can help.


Migrated from LUCENE-1337 by Brian Gardner, resolved Jul 17 2008
Environment:

Linux

Attachments: lucene.patch

@asfimport
Copy link
Author

Brian Gardner (migrated from JIRA)

This patch applies to version 2.3.1

@asfimport
Copy link
Author

asfimport commented Jul 16, 2008

Yonik Seeley (@yonik) (migrated from JIRA)

Thanks Brian, also see #1828 for more history and a bunch of options.

@asfimport
Copy link
Author

asfimport commented Jul 17, 2008

Michael McCandless (@mikemccand) (migrated from JIRA)

Duplicate of #1828.

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

The problem is the same but the solution is not. Do they each need separate patches listing more specifically how they solved the problem? Each solution has pluses and minuses. The NIOFSDirectory doesn't work on Windows. DescriptorsFSDirectory will on many Lucene installations quickly max out the file descriptors.

I would like to see both committed to trunk. MMapDirectory is in the trunk and it has limitations as well, mainly that (at least how I understand it) loads the all the files into ram.

@asfimport
Copy link
Author

asfimport commented Jul 17, 2008

Michael McCandless (@mikemccand) (migrated from JIRA)

Jason are you thinking of #1492 (NIOFSDirectory)?

@asfimport
Copy link
Author

asfimport commented Jul 17, 2008

Jason Rutherglen (migrated from JIRA)

Yonik checked in a modification of FSDirectory into #1828. I took that code and made NIOFSDirectory which is standalone so that it can be committed. It is checked into #1828 as lucene-753.patch.

@asfimport
Copy link
Author

asfimport commented Jul 19, 2008

Michael McCandless (@mikemccand) (migrated from JIRA)

Yonik checked in a modification of FSDirectory into #1828. I took that code and made NIOFSDirectory which is standalone so that it can be committed. It is checked into #1828 as lucene-753.patch.

OK. I think ? it's a good idea to separately offer an FSDirectory implementation that uses positional reads (via FileChannel) to avoid synchronization.

I'd also like to somehow make that implementation the default on those platforms (all except windows?) where there are clear concurrency gains. Ie, maybe change FSDirectory.getDirectory to return NIOFSDirectory if it's not on windows, but also offer a getDirectory that takes the IMPL so you can force it to pick a different IMPL. In general I think Lucene should default to good out of the box performance, ie, without requiring special knowledge/tuning on the user's part, so long as there's no difficult tradeoff.

Though we probably should change the name to something less generic than "nio", though I can't think of an alternative offhand.

But one question: it looks like NIOFSIndexInput copies most of BufferedIndexInput source rather than subclassing – why was that? Can we change that back to a subclass, perhaps opening up members of BufferedIndexInput a bit if necessary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant