Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAMInputStream and RAMOutputStream without further buffering [LUCENE-431] #1509

Closed
asfimport opened this issue Sep 13, 2005 · 13 comments
Closed

Comments

@asfimport
Copy link

From java-dev, Doug's reply of 12 Sep 2005
on Delaying buffer allocation in BufferedIndexInput:

Paul Elschot wrote:
...
> I noticed that RAMIndexInput extends BufferedIndexInput.
> It has all data in buffers already, so why is there another
> layer of buffering?

No good reason: it's historical.

To avoid this either: (a) the BufferedIndexInput API would need to be
modified to permit subclasses to supply the buffer; or (b)
RAMInputStream could subclass IndexInput directly, using its own
buffers. The latter would probably be simpler.

End of quote.

I made version (b) of RAMInputStream.
Using this RAMInputStream, TestTermVectorsReader failed as the only
failing test.


Migrated from LUCENE-431 by Paul Elschot, 1 vote, resolved Apr 17 2007
Environment:

Operating System: other
Platform: Other

Attachments: ASF.LICENSE.NOT.GRANTED--RAMInputStream.java, lucene-431.patch, lucene-431.zip

@asfimport
Copy link
Author

Paul Elschot (migrated from JIRA)

Created an attachment (id=16372)
RAMInputStream subclassing IndexInput directly

@asfimport
Copy link
Author

cutting@apache.org (@cutting) (migrated from JIRA)

This readByte() implementation will probably be slower than
BufferedIndexInput's. In particular, the call to buffers.elementAt() is not
cheap. It would be faster to cache the current buffer in a field. Fastest
would be to also cache the position in the buffer, something like the following
(untested):

private static final EMPTY_BUFFER = new byte[0];
private byte[] buffer = EMPTY_BUFFER;
private int bufferPosition;

public byte readByte() {
if (bufferPosition == buffer.length) {
pointer += buffer.length;
updateBuffer();
}
return buffer[bufferPosition++];
}

public byte readBytes(byte[] dest, int destOffset, int len) {
int start = pointer + bufferPosition;
... your code ...
updateBuffer();
}

public void seek(long pos) {
pointer = (int)pos;
updateBuffer();
}

public long getFilePointer() {
return pointer + bufferPosition;
}

private void updateBuffer() {
if (pointer < length) {
buffer = file.buffers.elementAt(pointer/BUFFER_SIZE));
bufferPosition = pointer%BUFFER_SIZE;
} else {
buffer = EMPTY_BUFFER;
bufferPosition = 0;
}
}

@asfimport
Copy link
Author

Michael Busch (migrated from JIRA)

We should fix both, RAMInputStream and RAMOutputStream to subclass IndexInput and IndexOutput directly. That saves a lot of unnecessary array copies.

I'm attaching a new patch that changes both classes. Unlike Paul's patch this one keeps the current buffer in a local variable (as Doug suggested).

All unit tests pass including TestTermVectorsReader. The reason why this test failes in Paul's patch is that RAMInputStream does not throw an IOException in case EOF is reached.

I did some quick tests in which I used a RAMDirectory to build an index. With this patch the test runs 170 secs, the old version takes 236 secs, which is an improvement of about 28%.

@asfimport
Copy link
Author

asfimport commented Mar 27, 2007

Michael McCandless (@mikemccand) (migrated from JIRA)

Michael, I wasn't able to cleanly apply this patch on the current trunk. I get this:

patch -p0 < lucene-431.patch
patching file src/java/org/apache/lucene/store/RAMInputStream.java
Hunk #2 FAILED at 21.
1 out of 2 hunks FAILED – saving rejects to file src/java/org/apache/lucene/store/RAMInputStream.java.rej
patching file src/java/org/apache/lucene/store/RAMOutputStream.java
Hunk #1 FAILED at 21.
1 out of 3 hunks FAILED – saving rejects to file src/java/org/apache/lucene/store/RAMOutputStream.java.rej
patching file src/test/org/apache/lucene/store/MockRAMOutputStream.java

I'd like to test this net performance gain with #1918. I think fixing this plus doing #1918 should make indexing into a RAMDirectory quite a bit faster.

@asfimport
Copy link
Author

Michael Busch (migrated from JIRA)

Mike,

that's strange.... for me the patch applies cleanly on the current trunk. I just tried it again.

Anyways, I'm attaching a zip containing the patched files. Now you should be able to test 843 with this one. Let me know if it doesn't work...

  • Michael

@asfimport
Copy link
Author

asfimport commented Mar 27, 2007

Doug Cutting (@cutting) (migrated from JIRA)

> I'd like to test this net performance gain with #1918.

Yes, it would be great to see how much each improves things individually as well as combined.

@asfimport
Copy link
Author

asfimport commented Mar 27, 2007

Michael McCandless (@mikemccand) (migrated from JIRA)

>> I'd like to test this net performance gain with #1918.
>
> Yes, it would be great to see how much each improves things individually as well as combined.

Will do!

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Michael, the patch problem seems to be something on my end, which I can't yet explain.

When I take your zip (thanks!), unzip into a fresh trunk checkout, run 'svn diff', take the output to another fresh trunk checkout, and try to apply that patch, I get the same error. Somehow my version of patch (2.5.4 on Debian) cannot handle the output of 'svn diff'. Spooky!

@asfimport
Copy link
Author

Joe Shaw (migrated from JIRA)

Michael: mysterious patch failures like that are usually caused by problems with line endings. Try running dos2unix on the patch and then apply it.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks for the advice :) Alas, I had already tried that on the original patch and it gives the same error. I remain baffled!

@asfimport
Copy link
Author

Michael Busch (migrated from JIRA)

Hello Mike,

did you get a chance to try this patch out? I'm planning to commit it soon...

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Yes, I did and it looks good. I would say commit it!

@asfimport
Copy link
Author

Michael Busch (migrated from JIRA)

Thanks for the quick (7 mins!) response, Mike :-).

I just committed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant