-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RAMInputStream and RAMOutputStream without further buffering [LUCENE-431] #1509
Comments
Paul Elschot (migrated from JIRA) Created an attachment (id=16372) |
cutting@apache.org (@cutting) (migrated from JIRA) This readByte() implementation will probably be slower than private static final EMPTY_BUFFER = new byte[0]; public byte readByte() { public byte readBytes(byte[] dest, int destOffset, int len) { public void seek(long pos) { public long getFilePointer() { private void updateBuffer() { |
Michael Busch (migrated from JIRA) We should fix both, RAMInputStream and RAMOutputStream to subclass IndexInput and IndexOutput directly. That saves a lot of unnecessary array copies. I'm attaching a new patch that changes both classes. Unlike Paul's patch this one keeps the current buffer in a local variable (as Doug suggested). All unit tests pass including TestTermVectorsReader. The reason why this test failes in Paul's patch is that RAMInputStream does not throw an IOException in case EOF is reached. I did some quick tests in which I used a RAMDirectory to build an index. With this patch the test runs 170 secs, the old version takes 236 secs, which is an improvement of about 28%. |
Michael McCandless (@mikemccand) (migrated from JIRA) Michael, I wasn't able to cleanly apply this patch on the current trunk. I get this: patch -p0 < lucene-431.patch I'd like to test this net performance gain with #1918. I think fixing this plus doing #1918 should make indexing into a RAMDirectory quite a bit faster. |
Michael Busch (migrated from JIRA) Mike, that's strange.... for me the patch applies cleanly on the current trunk. I just tried it again. Anyways, I'm attaching a zip containing the patched files. Now you should be able to test 843 with this one. Let me know if it doesn't work...
|
Doug Cutting (@cutting) (migrated from JIRA) > I'd like to test this net performance gain with #1918. Yes, it would be great to see how much each improves things individually as well as combined. |
Michael McCandless (@mikemccand) (migrated from JIRA) >> I'd like to test this net performance gain with #1918. Will do! |
Michael McCandless (@mikemccand) (migrated from JIRA) Michael, the patch problem seems to be something on my end, which I can't yet explain. When I take your zip (thanks!), unzip into a fresh trunk checkout, run 'svn diff', take the output to another fresh trunk checkout, and try to apply that patch, I get the same error. Somehow my version of patch (2.5.4 on Debian) cannot handle the output of 'svn diff'. Spooky! |
Joe Shaw (migrated from JIRA) Michael: mysterious patch failures like that are usually caused by problems with line endings. Try running dos2unix on the patch and then apply it. |
Michael McCandless (@mikemccand) (migrated from JIRA) Thanks for the advice :) Alas, I had already tried that on the original patch and it gives the same error. I remain baffled! |
Michael Busch (migrated from JIRA) Hello Mike, did you get a chance to try this patch out? I'm planning to commit it soon... |
Michael McCandless (@mikemccand) (migrated from JIRA) Yes, I did and it looks good. I would say commit it! |
Michael Busch (migrated from JIRA) Thanks for the quick (7 mins!) response, Mike :-). I just committed it. |
From java-dev, Doug's reply of 12 Sep 2005
on Delaying buffer allocation in BufferedIndexInput:
Paul Elschot wrote:
...
> I noticed that RAMIndexInput extends BufferedIndexInput.
> It has all data in buffers already, so why is there another
> layer of buffering?
No good reason: it's historical.
To avoid this either: (a) the BufferedIndexInput API would need to be
modified to permit subclasses to supply the buffer; or (b)
RAMInputStream could subclass IndexInput directly, using its own
buffers. The latter would probably be simpler.
End of quote.
I made version (b) of RAMInputStream.
Using this RAMInputStream, TestTermVectorsReader failed as the only
failing test.
Migrated from LUCENE-431 by Paul Elschot, 1 vote, resolved Apr 17 2007
Environment:
Attachments: ASF.LICENSE.NOT.GRANTED--RAMInputStream.java, lucene-431.patch, lucene-431.zip
The text was updated successfully, but these errors were encountered: