RAMInputStream and RAMOutputStream without further buffering [LUCENE-431] #1509

asfimport · 2005-09-13T04:08:16Z

From java-dev, Doug's reply of 12 Sep 2005
on Delaying buffer allocation in BufferedIndexInput:

Paul Elschot wrote:
...
> I noticed that RAMIndexInput extends BufferedIndexInput.
> It has all data in buffers already, so why is there another
> layer of buffering?

No good reason: it's historical.

To avoid this either: (a) the BufferedIndexInput API would need to be
modified to permit subclasses to supply the buffer; or (b)
RAMInputStream could subclass IndexInput directly, using its own
buffers. The latter would probably be simpler.

End of quote.

I made version (b) of RAMInputStream.
Using this RAMInputStream, TestTermVectorsReader failed as the only
failing test.

Migrated from LUCENE-431 by Paul Elschot, 1 vote, resolved Apr 17 2007
Environment:

Operating System: other
Platform: Other

Attachments: ASF.LICENSE.NOT.GRANTED--RAMInputStream.java, lucene-431.patch, lucene-431.zip

The text was updated successfully, but these errors were encountered:

asfimport · 2005-09-13T04:09:39Z

Paul Elschot (migrated from JIRA)

Created an attachment (id=16372)
RAMInputStream subclassing IndexInput directly

asfimport · 2005-09-13T05:11:48Z

cutting@apache.org (@cutting) (migrated from JIRA)

This readByte() implementation will probably be slower than
BufferedIndexInput's. In particular, the call to buffers.elementAt() is not
cheap. It would be faster to cache the current buffer in a field. Fastest
would be to also cache the position in the buffer, something like the following
(untested):

private static final EMPTY_BUFFER = new byte[0];
private byte[] buffer = EMPTY_BUFFER;
private int bufferPosition;

public byte readByte() {
if (bufferPosition == buffer.length) {
pointer += buffer.length;
updateBuffer();
}
return buffer[bufferPosition++];
}

public byte readBytes(byte[] dest, int destOffset, int len) {
int start = pointer + bufferPosition;
... your code ...
updateBuffer();
}

public void seek(long pos) {
pointer = (int)pos;
updateBuffer();
}

public long getFilePointer() {
return pointer + bufferPosition;
}

private void updateBuffer() {
if (pointer < length) {
buffer = file.buffers.elementAt(pointer/BUFFER_SIZE));
bufferPosition = pointer%BUFFER_SIZE;
} else {
buffer = EMPTY_BUFFER;
bufferPosition = 0;
}
}

asfimport · 2007-03-27T02:32:43Z

Michael Busch (migrated from JIRA)

We should fix both, RAMInputStream and RAMOutputStream to subclass IndexInput and IndexOutput directly. That saves a lot of unnecessary array copies.

I'm attaching a new patch that changes both classes. Unlike Paul's patch this one keeps the current buffer in a local variable (as Doug suggested).

All unit tests pass including TestTermVectorsReader. The reason why this test failes in Paul's patch is that RAMInputStream does not throw an IOException in case EOF is reached.

I did some quick tests in which I used a RAMDirectory to build an index. With this patch the test runs 170 secs, the old version takes 236 secs, which is an improvement of about 28%.

asfimport · 2007-03-27T09:48:21Z

Michael McCandless (@mikemccand) (migrated from JIRA)

Michael, I wasn't able to cleanly apply this patch on the current trunk. I get this:

patch -p0 < lucene-431.patch
patching file src/java/org/apache/lucene/store/RAMInputStream.java
Hunk #2 FAILED at 21.
1 out of 2 hunks FAILED – saving rejects to file src/java/org/apache/lucene/store/RAMInputStream.java.rej
patching file src/java/org/apache/lucene/store/RAMOutputStream.java
Hunk #1 FAILED at 21.
1 out of 3 hunks FAILED – saving rejects to file src/java/org/apache/lucene/store/RAMOutputStream.java.rej
patching file src/test/org/apache/lucene/store/MockRAMOutputStream.java

I'd like to test this net performance gain with #1918. I think fixing this plus doing #1918 should make indexing into a RAMDirectory quite a bit faster.

asfimport · 2007-03-27T17:38:04Z

Michael Busch (migrated from JIRA)

Mike,

that's strange.... for me the patch applies cleanly on the current trunk. I just tried it again.

Anyways, I'm attaching a zip containing the patched files. Now you should be able to test 843 with this one. Let me know if it doesn't work...

Michael

asfimport · 2007-03-27T18:35:16Z

Doug Cutting (@cutting) (migrated from JIRA)

> I'd like to test this net performance gain with #1918.

Yes, it would be great to see how much each improves things individually as well as combined.

asfimport · 2007-03-27T18:54:14Z

Michael McCandless (@mikemccand) (migrated from JIRA)

>> I'd like to test this net performance gain with #1918.
>
> Yes, it would be great to see how much each improves things individually as well as combined.

Will do!

asfimport · 2007-03-27T18:56:16Z

Michael McCandless (@mikemccand) (migrated from JIRA)

Michael, the patch problem seems to be something on my end, which I can't yet explain.

When I take your zip (thanks!), unzip into a fresh trunk checkout, run 'svn diff', take the output to another fresh trunk checkout, and try to apply that patch, I get the same error. Somehow my version of patch (2.5.4 on Debian) cannot handle the output of 'svn diff'. Spooky!

asfimport · 2007-03-27T22:41:05Z

Joe Shaw (migrated from JIRA)

Michael: mysterious patch failures like that are usually caused by problems with line endings. Try running dos2unix on the patch and then apply it.

asfimport · 2007-03-27T22:45:02Z

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks for the advice :) Alas, I had already tried that on the original patch and it gives the same error. I remain baffled!

asfimport · 2007-04-17T19:33:15Z

Michael Busch (migrated from JIRA)

Hello Mike,

did you get a chance to try this patch out? I'm planning to commit it soon...

asfimport · 2007-04-17T19:40:31Z

Michael McCandless (@mikemccand) (migrated from JIRA)

Yes, I did and it looks good. I would say commit it!

asfimport · 2007-04-17T20:13:04Z

Michael Busch (migrated from JIRA)

Thanks for the quick (7 mins!) response, Mike :-).

I just committed it.

asfimport closed this as completed Apr 17, 2007

asfimport mentioned this issue Aug 24, 2022

Improve indexing performance by increasing internal buffer sizes [LUCENE-888] #1963

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAMInputStream and RAMOutputStream without further buffering [LUCENE-431] #1509

RAMInputStream and RAMOutputStream without further buffering [LUCENE-431] #1509

asfimport commented Sep 13, 2005

asfimport commented Sep 13, 2005

asfimport commented Sep 13, 2005

asfimport commented Mar 27, 2007

asfimport commented Mar 27, 2007 •

edited

Loading

asfimport commented Mar 27, 2007

asfimport commented Mar 27, 2007 •

edited

Loading

asfimport commented Mar 27, 2007 •

edited

Loading

asfimport commented Mar 27, 2007

asfimport commented Mar 27, 2007

asfimport commented Mar 27, 2007

asfimport commented Apr 17, 2007

asfimport commented Apr 17, 2007

asfimport commented Apr 17, 2007

RAMInputStream and RAMOutputStream without further buffering [LUCENE-431] #1509

RAMInputStream and RAMOutputStream without further buffering [LUCENE-431] #1509

Comments

asfimport commented Sep 13, 2005

asfimport commented Sep 13, 2005

asfimport commented Sep 13, 2005

asfimport commented Mar 27, 2007

asfimport commented Mar 27, 2007 • edited Loading

asfimport commented Mar 27, 2007

asfimport commented Mar 27, 2007 • edited Loading

asfimport commented Mar 27, 2007 • edited Loading

asfimport commented Mar 27, 2007

asfimport commented Mar 27, 2007

asfimport commented Mar 27, 2007

asfimport commented Apr 17, 2007

asfimport commented Apr 17, 2007

asfimport commented Apr 17, 2007

asfimport commented Mar 27, 2007 •

edited

Loading

asfimport commented Mar 27, 2007 •

edited

Loading

asfimport commented Mar 27, 2007 •

edited

Loading