-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-2432: Use ByteBufferAllocator over hardcoded heap allocation #1278
Conversation
* Updated BytesInput implementations to rely on a ByteBufferAllocator instance for allocating/releasing ByteBuffer objects. * Extend the usage of a ByteBufferAllocator instead of the hardcoded usage of heap (e.g. byte[], ByteBuffer.allocate etc.) * parquet-cli related code parts including ParquetRewriter and tests are not changed in this effort
@wgtmac, if you have some time, could you check this out? |
Sure, I will take a look by the end of this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check the test thoroughly but this overall LGTM.
@@ -207,10 +211,18 @@ public static BytesInput copy(BytesInput bytesInput) throws IOException { | |||
*/ | |||
public abstract void writeAllTo(OutputStream out) throws IOException; | |||
|
|||
/** | |||
* For internal use only. It is expected that the buffer is large enough to fit the content of this {@link BytesInput} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a comment for what to expect if the content does not fit into the ByteBuffer?
* @return a text representation of the memory usage of this structure | ||
*/ | ||
public String memUsageString(String prefix) { | ||
return format("%s %s %d slabs, %,d bytes", prefix, getClass().getSimpleName(), slabs.size(), size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return format("%s %s %d slabs, %,d bytes", prefix, getClass().getSimpleName(), slabs.size(), size); | |
return format("%s %s %d slabs, %d bytes", prefix, getClass().getSimpleName(), slabs.size(), size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just copy-pasted this from ConcatenatingByteArrayCollector
but it seems to be intentional. %,d
adds separators to the value representation (e.g. 123,456,789
).
import java.nio.ByteBuffer; | ||
|
||
/** | ||
* A special {@link ByteBufferAllocator} implementation that keeps one {@link ByteBuffer} object and reuse it at the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* A special {@link ByteBufferAllocator} implementation that keeps one {@link ByteBuffer} object and reuse it at the | |
* A special {@link ByteBufferAllocator} implementation that keeps one {@link ByteBuffer} object and reuses it at the |
this.allocator = allocator; | ||
this.toRelease = toRelease; | ||
void setReleaser(ByteBufferReleaser releaser) { | ||
this.releaser = releaser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check if the passed releaser is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is internal (both the method and the class are package private). I wouldn't do additional checks.
int footerSignatureLength = AesCipher.NONCE_LENGTH + AesCipher.GCM_TAG_LENGTH; | ||
byte[] serializedFooter = new byte[combinedFooterLength - footerSignatureLength]; | ||
System.arraycopy(footerAndSignature, 0, serializedFooter, 0, serializedFooter.length); | ||
// Resetting to the beginning of the footer | ||
from.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check from.markSupported()
before calling reset() and mark()?
allocator); | ||
} | ||
|
||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument list grows longer now. Should we use an options class instead to avoid frequent deprecation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From one hand I completely agree. From the other hand, ParquetFileWriter should be an internal class. It is unfortunate that it is public. I would not create yet another parameters builder for ParquetFileWriter.
I'll think about a solution somewhere in between.
Thank you, @wgtmac |
Make sure you have checked all steps below.
Jira
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
the ASF 3rd Party License Policy.
Tests
Commits
from "How to write a good git commit message":
Style
mvn spotless:apply -Pvector-plugins
Documentation