Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NIO positional read to avoid synchronization in FSIndexInput [LUCENE-753] #1828

Closed
asfimport opened this issue Dec 19, 2006 · 82 comments
Closed

Comments

@asfimport
Copy link

As suggested by Doug, we could use NIO pread to avoid synchronization on the underlying file.
This could mitigate any MT performance drop caused by reducing the number of files in the index format.


Migrated from LUCENE-753 by Yonik Seeley (@yonik), 5 votes, resolved Jan 25 2011
Attachments: FileReadTest.java (versions: 8), FSDirectoryPool.patch, FSIndexInput.patch (versions: 2), lucene-753.patch (versions: 2), LUCENE-753.patch (versions: 5)

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

Patch for FSIndexInput to use a positional read call that doesn't use explicit synchronization. Note that the implementation of that read call may still involve some synchronization depending on the JVM and OS (notably Windows which lacks a native pread AFAIK).

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

This change should be faster on heavily loaded multi-threaded servers using the non-compound index format.
Performance tests are needed to see if there is any negative impact on single-threaded performance.

Compound index format (CSIndexInput) still does synchronization because the base IndexInput is not cloned (and hence shared by all CSIndexInput clones). It's unclear if getting rid of the synchronization is worth the cloning overhead in this case.

@asfimport
Copy link
Author

Doug Cutting (@cutting) (migrated from JIRA)

This patch continues to use BufferedIndexInput and allocates a new ByteBuffer for each call to read(). I wonder if it might be more efficient to instead directly extend IndexInput and always represent the buffer as a ByteBuffer?

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

CSIndexInput synchronization could also be elimitated if there was a pread added to IndexInput

public abstract void readBytes(byte[] b, int offset, int len, long fileposition)

Unfortunately, that would break any custom Directory based implementations out there, and we can't provide a suitable default with seek & read because we don't know what object to synchronize on.
Worth it or not???

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

Here is a patch that directly extends IndexInput to make things a little easier.
I started with the code for BufferedIndexInput to avoid any bugs in read().
They share enough code that a common subclass could be factored out if desired (or changes made in BufferedIndexInput to enable easier sharing).

ByteBuffer does have offset, length, etc, but I did not use them because BufferedIndexInput currently allocates the byte[] on demand, and thus would add additional checks to readByte(). Also, the NIO Buffer.get() isn't as efficient as our own array access.

@asfimport
Copy link
Author

asfimport commented Dec 20, 2006

Bogdan Ghidireac (migrated from JIRA)

You can find a NIO variation of IndexInput attached to this issue: #1597

I had good results on multiprocessor machines under heavy load.

Regards,
Bogdan

@asfimport
Copy link
Author

asfimport commented Dec 20, 2006

Yonik Seeley (@yonik) (migrated from JIRA)

Thanks for the pointer Bogdan, it's interesting you use transferTo instead of read... is there any advantage to this? You still need to create a new object every read(), but at least it looks like a smaller object.

It's also been pointed out to me that #1492 has some more NIO code.

@asfimport
Copy link
Author

Bogdan Ghidireac (migrated from JIRA)

The Javadoc says that transferTo can be more efficient because the OS can transfer bytes directly from the filesystem cache to the target channel without actually copying them.

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

> The Javadoc says that transferTo can be more efficient because the OS can transfer bytes
> directly from the filesystem cache to the target channel without actually copying them.

Unfortunately, only for DirectByteBuffers and other FileChannels, not for HeapByteBuffers.
Sounds like we just need to do some benchmarking, but I have a bad feeling that all the checking overhead Sun added to NIO will cause it to be slower in the single threaded case.

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

Attaching test that reads a file in different ways, either random access or serially, from a number of threads.

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

Single-threaded random access performance of a fully cached 64MB file on my home PC (WinXP) , Java6:

config: impl=ClassicFile serial=false nThreads=1 iterations=200 bufsize=1024 filelen=6518936
answer=81332126, ms=7781, MB/sec=167.5603649916463

config: impl=ChannelFile serial=false nThreads=1 iterations=200 bufsize=1024 filelen=6518936
answer=81332126, ms=9203, MB/sec=141.66980332500273

config: impl=ChannelPread serial=false nThreads=1 iterations=200 bufsize=1024 filelen=6518936
answer=81332126, ms=11672, MB/sec=111.70212474297463

config: impl=ChannelTransfer serial=false nThreads=1 iterations=200 bufsize=1024 filelen=6518936
answer=81332126, ms=17328, MB/sec=75.2416435826408

@asfimport
Copy link
Author

Brian Pinkerton (migrated from JIRA)

Most of my workloads would benefit by removing the synchronization in FSIndexInput, so I took a closer look at this issue. I found exactly the opposite results that Yonik did on two platforms that I use frequently in production (Solaris and Linux), and by a significant margin. I even get the same behavior on the Mac, though I'm not running Java6 there.

  1. uname -a
    Linux xxx 2.6.9-22.0.1.ELsmp #1 SMP Tue Oct 18 18:39:27 EDT 2005 i686 i686 i386 GNU/Linux
  2. java -version
    java version "1.6.0_02"
    Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
    Java HotSpot(TM) Client VM (build 1.6.0_02-b05, mixed mode, sharing)

config: impl=ChannelPread serial=false nThreads=200 iterations=10 bufsize=1024 filelen=10485760
answer=0, ms=88543, MB/sec=236.85124741650947
config: impl=ClassicFile serial=false nThreads=200 iterations=10 bufsize=1024 filelen=10485760
answer=0, ms=150560, MB/sec=139.29011689691816

  1. uname -a
    SunOS xxx 5.10 Generic_118844-26 i86pc i386 i86pc
  2. java -version
    java version "1.6.0"
    Java(TM) SE Runtime Environment (build 1.6.0-b105)
    Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)

config: impl=ChannelPread serial=false nThreads=200 iterations=10 bufsize=1024 filelen=10485760
answer=0, ms=39621, MB/sec=529.3031473208652

config: impl=ClassicFile serial=false nThreads=200 iterations=10 bufsize=1024 filelen=10485760
answer=0, ms=119057, MB/sec=176.14688762525515

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

Brad, one possible difference is the number of threads we tested with.
I tested single-threaded (nThreads=1) to see what kind of slowdown a single query might see.

A normal production system shouldn't see 200 concurrent running search threads unless it's just about to fall over, or unless it's one of those massive multi-core systems. After you pass a certain amount of parallelism, NIO can help.

@asfimport
Copy link
Author

Brian Pinkerton (migrated from JIRA)

Whoops; I should have paid more attention to the args. The results in the single-threaded case still favor pread, but by a slimmer margin:

Linux:

config: impl=ClassicFile serial=false nThreads=1 iterations=200 bufsize=1024 filelen=10485760
answer=0, ms=9983, MB/sec=210.0723229490133

config: impl=ChannelPread serial=false nThreads=1 iterations=200 bufsize=1024 filelen=10485760
answer=0, ms=9247, MB/sec=226.7926895209257

Solaris 10:

config: impl=ClassicFile serial=false nThreads=1 iterations=200 bufsize=1024 filelen=10485760
answer=0, ms=7381, MB/sec=284.12843788104595

config: impl=ChannelPread serial=false nThreads=1 iterations=200 bufsize=1024 filelen=10485760
answer=0, ms=6245, MB/sec=335.81297037630105

Mac OS X:

config: impl=ChannelPread serial=false nThreads=1 iterations=200 bufsize=1024 filelen=10485760
answer=-914995, ms=19945, MB/sec=105.14675357232389

config: impl=ClassicFile serial=false nThreads=1 iterations=200 bufsize=1024 filelen=10485760
answer=-914995, ms=26378, MB/sec=79.50382894836606

@asfimport
Copy link
Author

Doug Cutting (@cutting) (migrated from JIRA)

> Brad, [...]

That's Brian. And right, the difference in your tests is the number of threads.

Perhaps this is a case where one size will not fit all. MmapDirectory is fastest on 64-bit platforms with lots of threads, while good-old-FSDirectory has always been fastest for single-threaded access. Perhaps a PreadDirectory would be the Directory of choice for multi-threaded access of large indexes on 32-bit hardware? It would be useful to benchmark this patch against MmapDirectory, since they both remove synchronization.

@asfimport
Copy link
Author

Doug Cutting (@cutting) (migrated from JIRA)

My prior remarks were posted before I saw Brian's latest benchmarks.

While it would still be good to throw mmap into the mix, pread now looks like a strong contender for the one that might beat all. It works well on 32-bit hardware, it's unsynchronized, and it's fast. What's not to like?

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

Weird... I'm still getting slower results from pread on WinXP.
Can someone else verify on a windows box?

Yonik@spidey \~
$ c:/opt/jdk16/bin/java -server FileReadTest testfile ClassicFile false 1 200
config: impl=ClassicFile serial=false nThreads=1 iterations=200 bufsize=1024 filelen=9616000
answer=160759732, ms=14984, MB/sec=128.35024025627337

$ c:/opt/jdk16/bin/java -server FileReadTest testfile ClassicFile false 1 200
config: impl=ClassicFile serial=false nThreads=1 iterations=200 bufsize=1024 filelen=9616000
answer=160759732, ms=14640, MB/sec=131.36612021857923


$ c:/opt/jdk16/bin/java -server FileReadTest testfile ChannelPread false 1 200
config: impl=ChannelPread serial=false nThreads=1 iterations=200 bufsize=1024 filelen=9616000
answer=160759732, ms=21766, MB/sec=88.35798952494717

$ c:/opt/jdk16/bin/java -server FileReadTest testfile ChannelPread false 1 200
config: impl=ChannelPread serial=false nThreads=1 iterations=200 bufsize=1024 filelen=9616000
answer=160759732, ms=21718, MB/sec=88.55327378211622


$ c:/opt/jdk16/bin/java -version
java version "1.6.0_02"
Java(TM) SE Runtime Environment (build 1.6.0_02-b06)
Java HotSpot(TM) Client VM (build 1.6.0_02-b06, mixed mode)

@asfimport
Copy link
Author

robert engels (migrated from JIRA)

I sent this via email, but probably need to add to the thread...

I posted a bug on this to Sun a long while back. This is a KNOWN BUG on Windows.

NIO preads actually sync behind the scenes on some platforms. Using multiple file descriptors is much faster.

See bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6265734

@asfimport
Copy link
Author

Doug Cutting (@cutting) (migrated from JIRA)

So it looks like pread is ~50% slower on Windows, and ~5-25% faster on other platforms. Is that enough of a difference that it might be worth having FSDirectory use different implementations of FSIndexInput based on the value of Constants.WINDOWS (and perhaps JAVA_VERSION)?

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Is that enough of a difference that it might be worth having FSDirectory use different implementations of FSIndexInput based on the value of Constants.WINDOWS (and perhaps JAVA_VERSION)?

+1

I think having good out-of-the-box defaults is extremely important (most users won't tune), and given the substantial cross platform differences here I think we should conditionalize the defaults according to the platform.

@asfimport
Copy link
Author

robert engels (migrated from JIRA)

As an aside, if the Lucene people voted on the Java bug (and or sent emails via the proper channels), hopefully the underlying bug can be fixed in the JVM.

In my opinion it is a serious one - ruins any performance gains of using NIO on files.

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

Updated test that fixes some thread synchronization issues to ensure that the "answer" is the same for all methods.

Brian, in some of your tests the answer is "0"... is this because your test file consists of zeros (created via /dev/zero or equiv)? UNIX systems treat blocks of zeros differently than normal files (they are stored as holes). It shouldn't make too much of a difference in this case, but just to be sure, could you try with a real file?

@asfimport
Copy link
Author

Brian Pinkerton (migrated from JIRA)

Yeah, the file was full of zeroes. But I created the files w/o holes and was using filesystems that don't compress file contents. Just to be sure, though, I repeated the tests with a file with random contents; the results above still hold.

@asfimport
Copy link
Author

Brian Pinkerton (migrated from JIRA)

BTW, I think the performance win with Yonik's patch for some workloads could be far greater than what the simple benchmark illustrates. Sure, pread might be marginally faster. But the real win is avoiding synchronized access to the file.

I did some IO tracing a while back on one particular workload that is characterized by:

  • a small number of large compound indexes
  • short average execution time, particularly compared to disk response time
  • a 99+% FS cache hit rate
  • cache misses that tend to cluster on rare queries

In this workload where each query hits each compound index, the locking in FSIndexInput means that a single rare query clobbers the response time for all queries. The requests to read cached data are serialized (fairly, even) with those that hit the disk. While we can't get rid of the rare queries, we can allow the common ones to proceed against cached data right away.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

I ran Yonik's most recent FileReadTest.java on the platforms below,
testing single-threaded random access for fully cached 64 MB file.

I tested two Windows XP Pro machines and got opposite results from
Yonik. Yonik is your machine XP Home?

I'm showing ChannelTransfer to be much faster on all platforms except
Windows Server 2003 R2 Enterprise x64 where it's about the same as
ChannelPread and ChannelFile.

The ChannelTransfer test is giving the wrong checksum, but I think
just a bug in how checksum is computed (it's using "len" which with
ChannelTransfer is just the chunk size written on each call to
write). So I think the MB/sec is still correct.

Mac OS X 10.4 (Sun java 1.5)
config: impl=ClassicFile serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=32565, MB/sec=412.15331797942576

config: impl=ChannelFile serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=19512, MB/sec=687.8727347273473

config: impl=ChannelPread serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=19492, MB/sec=688.5785347835009

config: impl=ChannelTransfer serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=147783, ms=16009, MB/sec=838.3892060715847

Linux 2.6.22.1 (Sun java 1.5)
config: impl=ClassicFile serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=37879, MB/sec=354.33281765622115

config: impl=ChannelFile serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=21845, MB/sec=614.4093751430535

config: impl=ChannelPread serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=21902, MB/sec=612.8103734818737

config: impl=ChannelTransfer serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=147783, ms=15978, MB/sec=840.015821754913

Windows Server 2003 R2 Enterprise x64 (Sun java 1.6)

config: impl=ClassicFile serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=32703, MB/sec=410.4141149130049

config: impl=ChannelFile serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=23344, MB/sec=574.9559972583961

config: impl=ChannelPread serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=23329, MB/sec=575.3256804835183

config: impl=ChannelTransfer serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=147783, ms=23422, MB/sec=573.0412774314747

Windows XP Pro SP2, laptop (Sun Java 1.5)

config: impl=ClassicFile serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=71253, MB/sec=188.36782731955148

config: impl=ChannelFile serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=57463, MB/sec=233.57243443607192

config: impl=ChannelPread serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=58043, MB/sec=231.23844046655068

config: impl=ChannelTransfer serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=147783, ms=20039, MB/sec=669.7825640001995

Windows XP Pro SP2, older desktop (Sun Java 1.6)

config: impl=ClassicFile serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=53047, MB/sec=253.01662299470283

config: impl=ChannelFile serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=34047, MB/sec=394.2130819161747

config: impl=ChannelPread serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=-44611, ms=34078, MB/sec=393.8544750278772

config: impl=ChannelTransfer serial=false nThreads=1 iterations=200 bufsize=6518936 filelen=67108864
answer=147783, ms=18781, MB/sec=714.6463340610192

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

I also just ran a test with 4 threads, random access, on Linux 2.6.22.1:

config: impl=ClassicFile serial=false nThreads=4 iterations=200 bufsize=6518936 filelen=67108864
answer=-195110, ms=120856, MB/sec=444.22363142913883

config: impl=ChannelFile serial=false nThreads=4 iterations=200 bufsize=6518936 filelen=67108864
answer=-195110, ms=88272, MB/sec=608.2006887801341

config: impl=ChannelPread serial=false nThreads=4 iterations=200 bufsize=6518936 filelen=67108864
answer=-195110, ms=77672, MB/sec=691.2026367288084

config: impl=ChannelTransfer serial=false nThreads=4 iterations=200 bufsize=6518936 filelen=67108864
answer=594875, ms=38390, MB/sec=1398.465517061735

ChannelTransfer got even faster (scales up with added threads better).

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

Mike, it looks like you are running with a bufsize of 6.5MB!
Apologies for my hard-to-use benchmark program :-(

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

I'll try fixing the transferTo test before anyone re-runs any tests.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Doh!! Woops :) I will rerun...

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

OK, uploading latest version of the test that should fix ChannelTransfer (it's also slightly optimized to not create a new object per call).

Well, at least we've learned that printing out all the input params for benchmarking programs is good practice :-)

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

I can possibly work on this, just go through and reedit the BufferedIndexInput portions of the code. Inheriting is difficult because of the ByteBuffer code. Needs to be done line by line.

That would be awesome, Jason. I think we should then commit NIOFSDirectory to core as at least a way around this bottleneck on all platforms but Windows. Maybe we can do this in time for 2.4?

@asfimport
Copy link
Author

Matthew Mastracci (migrated from JIRA)

Matthew, could you give this one a shot to see if it fixes your case? Thanks.

Michael,

I ran this new patch against our big index and it works very well. If I have time, I'll run some benchmarks to see what our real-life performance improvements are like.

Note that I'm only running it for our read-only snapshot of the index, however, so this hasn't been tested for writing to a large index.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

I ran this new patch against our big index and it works very well. If I have time, I'll run some benchmarks to see what our real-life performance improvements are like.

Super, thanks! This was the same index that would reliably hit the exception above?

@asfimport
Copy link
Author

Matthew Mastracci (migrated from JIRA)

Super, thanks! This was the same index that would reliably hit the exception above?

Correct - it would hit the exception every time at startup.

I've been running NIOFSDirectory for the last couple of hours with zero exceptions (except for running out of file descriptors after starting it the first time :)). The previous incarnation was running MMapDirectory.

Thanks for all the work on this patch.

@asfimport
Copy link
Author

Matthew Mastracci (migrated from JIRA)

This exception popped up out of the blue a few hours in. No exceptions before it. I'll see if I can figure out whether it was caused by our index snapshotting or if it's a bug elsewhere in NIOFSDirectory.

I haven't seen any exceptions like this with MMapDirectory, but it's possible there's something that we're doing that isn't correct.

 
Caused by: java.nio.channels.ClosedChannelException
	at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:91)
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:616)
	at com.dotspots.analyzer.index.NIOFSDirectory$NIOFSIndexInput.read(NIOFSDirectory.java:186)
	at com.dotspots.analyzer.index.NIOFSDirectory$NIOFSIndexInput.refill(NIOFSDirectory.java:218)
	at com.dotspots.analyzer.index.NIOFSDirectory$NIOFSIndexInput.readByte(NIOFSDirectory.java:232)
	at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
	at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:63)
	at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
	at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:154)
	at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:223)
	at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:217)
	at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:678)
	at org.apache.lucene.index.MultiSegmentReader.docFreq(MultiSegmentReader.java:373)
	at org.apache.lucene.index.MultiReader.docFreq(MultiReader.java:310)
	at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:87)
	at org.apache.lucene.search.Searcher.docFreqs(Searcher.java:178)

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Interesting...

Are you really sure you're not accidentally closing the searcher before calling Searcher.docFreqs? Are you calling docFreqs directly from your app?

It looks like MMapIndexInput.close() is a noop so it would not have detected calling Searcher.docFreqs after close, whereas NIOFSdirectory (and the normal FSDirectory) will.

If you try the normal FSDirectory do you also an exception like this?

Incidentally, what sort of performance differences are you noticing between these three different ways of accessing an index in the file system?

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

Maybe we can do this in time for 2.4?

+1

Latest patch is looking good to me!
Is there a reason we don't do lazy allocation in clone() like FSIndexInput?

Also, our finalizers aren't technically thread safe which could lead to a double close in the finalizer (although I doubt if this particular case would ever happen). If we need to keep them, we could change Descriptor.isOpen to volatile and there should be pretty much no cost since it's only checked in close().

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Is there a reason we don't do lazy allocation in clone() like FSIndexInput?

Yonik, do you mean BufferedIndexInput.clone (not FSIndexInput)?

I think once we fix NIOFSIndexInput to subclass from BufferedIndexInput, then cloning should be lazy again. Jason are you working on this (subclassing from BufferedIndexInput)? If not I can take it.

Also, our finalizers aren't technically thread safe which could lead to a double close in the finalizer

Hmmm... I'll update both FSDirectory and NIOFSDiretory's isOpen's to be volatile.

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

Mike I have have not started on the subclassing from BufferedIndexInput yet. I can work on it monday though.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Mike I have have not started on the subclassing from BufferedIndexInput yet. I can work on it monday though.

OK, thanks!

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Updated patch with Yonik's volatile suggestion – thanks Yonik!

Also, I removed NIOFSDirectory.createOutput since it was doing the same thing as super().

@asfimport
Copy link
Author

Matthew Mastracci (migrated from JIRA)

Michael,

Are you really sure you're not accidentally closing the searcher before calling Searcher.docFreqs? Are you calling docFreqs directly from your app?

Our IndexReaders are actually managed in a shared pool (currently 8 IndexReaders, shared round-robin style as requests come in). We have some custom reference counting logic that's supposed to keep the readers alive as long as somebody has them open. As new index snapshots come in, the IndexReaders are re-opened and reference counts ensure that any old index readers in use are kept alive until the searchers are done with them. I'm guessing we have an error in our reference counting logic that just doesn't show up under MMapDirectory (as you mentioned, close() is a no-op).

We're calling docFreqs directly from our app. I'm guessing that it just happens to be the most likely item to be called after we roll to a new index snapshot.

I don't have hard performance numbers right now, but we were having a hard time saturating I/O or CPU with FSDirectory. The locking was basically killing us. When we switched to MMapDirectory and turned on compound files, our performance jumped at least 2x. The preliminary results I'm seeing with NIOFSDirectory seem to indicate that it's slightly faster than MMapDirectory.

I'll try setting our app back to using the old FSDirectory and see if the exceptions still occur. I'll also try to fiddle with our unit tests to make sure we're correctly ref-counting all of our index readers.

BTW, I ran a quick FSDirectory/MMapDirectory/NIOFSDirectory shootout. It uses a parallel benchmark that roughly models what our real-life benchmark is like. I ran the benchmark once through to warm the disk cache, then got the following. The numbers are fairly stable across various runs once the disk caches are warm:

FS: 33644ms
MMap: 28616ms
NIOFS: 33189ms

I'm a bit surprised at the results myself, but I've spent a bit of time tuning the indexes to maximize concurrency. I'll double-check that the benchmark is correctly running all of the tests.

The benchmark effectively runs 10-20 queries in parallel at a time, then waits for all queries to complete. It does this end-to-end for a number of different query batches, then totals up the time to complete each batch.

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

LUCENE-753.patch

NIOFSIndexInput now extends BufferedIndexInput. I was unable to test however and wanted to just get this up.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

FS: 33644ms
MMap: 28616ms
NIOFS: 33189ms

I'm a bit surprised at the results myself, but I've spent a bit of time tuning the indexes to maximize concurrency. I'll double-check that the benchmark is correctly running all of the tests.

This is surprising – your benchmark is very concurrent, yet FSDir and NIOFSDir are close to the same net throughput, while MMapDir is quite a bit faster. Is this on a non-Windows OS?

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

New patch attached. Matthew if you could try this version out on your
index, that'd be awesome.

I didn't like how we were still copying the hairy readBytes & refill
methods from BufferedIndexInput, so I made some small additional mods
to BufferedIndexInput to notify subclass when a byte[] buffer gets
allocated, which then allowed us to fully inherit these methods.

But, then I realized we were duplicating alot of code from
FSIndexInput, so I switched to subclassing that instead and that made
things even simpler.

Some other things also fixed:

  • We were ignoring bufferSize (eg setBufferSize).

  • We weren't closing the FileChannel

  • clone() now lazily clones the buffer again

To test this, I made NIOFSDirectory the default IMPL in
FSDirectory.getDirectory and ran all tests. One test failed at first
(because we were ignoring setBufferSize calls); with the new patch,
all tests pass.

I also built first 150K docs of wikipedia and ran various searches
using NIOFSDirectory and all seems good.

The class is quite a bit simpler now, however there's one thing I
don't like: when you use CFS, the NIOFSIndexInput.readInternal method
will wrap the CSIndexInput's byte[] (from it's parent
BufferedIndexInput class) for every call (every 1024 bytes read from
the file). I'd really like to find a clean way to reuse a single
ByteBuffer. Not yet sure how to do that though...

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

New version attached. This one re-uses a wrapped byte buffer even when it's CSIndexInput that's calling it.

I plan to commit in a day or two.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

I just committed revision 690539, adding NIOFSDirectory. I will leave this open, but move off of 2.4, until we can get similar performance gains on Windows...

@asfimport
Copy link
Author

robert engels (migrated from JIRA)

SUN is accepting outside bug fixes to the Open JDK, and merging them to the commercial JDK (in most cases).

If the underlying bug is fixed in the Windows JDK - not too hard - then you fix this properly in Lucene.

If you don't fix it in the JDK you are always going to have the 'running out of file handles' synchronization, vs, the "locked position" synchronization - there is no way to fix this in user code...

@asfimport
Copy link
Author

Yonik Seeley (@yonik) (migrated from JIRA)

Attaching new FileReadTest.java that fixes a concurrency bug in SeparateFile - each reader needed it's own file position.

@asfimport
Copy link
Author

Uwe Schindler (@uschindler) (migrated from JIRA)

This issue was resolved a long time ago, but left open for the stupid Windows Sun JRE bug which was never resolved. With Lucene 3.x and trunk we have better defaults (use e.g. MMapDirectory on Windows-64).

Users should default to FSDirectory.open() and use the returned directory for best performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants