Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java NIO patch against Lucene 1.9 [LUCENE-414] #1492

Closed
asfimport opened this issue Jul 24, 2005 · 6 comments
Closed

Java NIO patch against Lucene 1.9 [LUCENE-414] #1492

asfimport opened this issue Jul 24, 2005 · 6 comments

Comments

@asfimport
Copy link

Robert Engels previously submitted a patch against Lucene 1.4 for a Java NIO-
based Directory implementation. It also included some changes to FSDirectory
to allow better concurrency when searching from multiple threads. The
complete thread is at:

http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/%
3cLMENLAOACIBLMOIILNNNEEOEEPAA.rengels@ix.netcom.com%3e

This thread ended with Doug Cutting suggesting that someone port Robert's
changes to the SVN trunk. This is what I've done in this patch.

There are two parts to the patch. The first part modifies FieldsReader,
CompoundFileReader, and SegmentReader, to allow better concurrency when
reading an index. The second part includes the new NioFSDirectory
implementation, and makes small changes to FSDirectory and IndexInput to
accomodate this change. I'll put a more detailed outline of the changes to
each file in a separate message.

To use the new NioFSDirectory, set the system property
org.apache.lucene.FSDirectory.class to
org.apache.lucene.store.NioFSDirectory. This will cause
FSDirectory.getDirectory() to return an NioFSDirectory instance. By default,
NioFile limits the number of concurrent channels to 4, but you can override
this by setting the system property org.apache.lucene.nio.channels.

I did some performance tests with these patches. The biggest improvement came
from the concurrency improvements. NioFSDirectory performed about the same as
FSDirectory (with the concurrency improvements).

I ran my tests under Fedora Core 1; uname -a reports:
Linux myhost 2.4.22-1.2199.nptlsmp #1 SMP Wed Aug 4 11:48:29 EDT 2004 i686
i686 i386 GNU/Linux

The machine is a dual xeon 2.8GHz with 4GB RAM, and the tests were run against
a 9GB compound index file. The tests were run "hot" – with everything
already cached by linux's filesystem cache. The numbers are:

FSDirectory without patch: 13.3 searches per second
FSDirectory WITH concurrency patch: 14.3 searches per second

Both tests were run with 6 concurrent threads, which gave the highest numbers
in each case. I suspect that the concurrency improvements would make a bigger
difference on a more realistic test where the index isn't all cached in RAM
already, since the I/O happens whild holding the sychronized lock. Patches to
follow...

Thoughts?


Migrated from LUCENE-414 by Chris Lamprecht, 3 votes, resolved Sep 22 2008
Environment:

Operating System: All
Platform: All

Attachments: ASF.LICENSE.NOT.GRANTED--MemoryLRUCache.java, ASF.LICENSE.NOT.GRANTED--NioFile.java, ASF.LICENSE.NOT.GRANTED--nio-lucene-1.9.patch

@asfimport
Copy link
Author

Chris Lamprecht (migrated from JIRA)

Created an attachment (id=15757)
Concurrency and NIO patch for lucene SVN trunk

NioFile: new class
NioFSDirectory: new class, extends FSDirectory, uses java's NIO classes

FSDirectory - changes for NioFSDirectory

  • made init() method protected so NioFSDirectory can call it

IndexInput - changes for NioFSDirectory

  • add readBytes() method that takes a position while preserving the
    IndexInput's position (overridden in NioInputStream)

CONCURRENCY IMPROVEMENTS


FieldsReader - concurrency improvements

  • add and use ThreadStream inner class for istream and fstream (ThreadLocal
    variables)

CompoundFileReader - concurrency improvements

  • remove synchronized block in readInternal(). The sychronization is now in
    IndexInput implementation.
    This allows subclasses (such as NioFSDirectory.NioIndexInput) to NOT use
    sychronized blocks for better concurrency.

SegmentReader - concurrency improvement

  • removed synchronized from isDeleted() by getting a copy of the reference
    instead

@asfimport
Copy link
Author

robert engels (migrated from JIRA)

(In reply to comment #0)

I've attached an improved NioFile and caching mechanism. One of the problems
with the earlier implementation was that the cache size could grow enemorous
(since it was a certain percentage of all outstanding segements).

The attached cache is shared by all segments and has an upper bound.

As a note, your performance improvement numbers do not jive with what I have
seen. How many simultaneous threads are you using? What was the cache size?

@asfimport
Copy link
Author

robert engels (migrated from JIRA)

Created an attachment (id=15781)
NioFile with shared cache

@asfimport
Copy link
Author

robert engels (migrated from JIRA)

Created an attachment (id=15782)
shared cache with multi-segment keys

@asfimport
Copy link
Author

Doug Cutting (@cutting) (migrated from JIRA)

The channels should all be opened when the IndexInput is created, as files can subsequently get deleted.

Also, I'm not sure why this uses nio. Classic io would also permit you to have multiple file handles per file, for more parallel io. So you could just patch FSDirectory to permit that, no?

Finally, if files are on a single drive, then the concurrency improvements are probably negligible. This would only really pay off with a RAID, where different parts of a file are stored on different physical devices. Or am I missing something?

@asfimport
Copy link
Author

asfimport commented Sep 22, 2008

Michael McCandless (@mikemccand) (migrated from JIRA)

I believe most of this is a dup of #1828, and/or separately incporated into Lucene already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant