-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use hex string as the representation of byte array for queries #4041
Conversation
0a7f4bf
to
807e15c
Compare
byte[] bytes; | ||
// Convert hex string to byte[]. | ||
if (rawValue instanceof String) { | ||
bytes = DatatypeConverter.parseHexBinary((String) rawValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other parts of the code use org.apache.commons.codec.binary.Hex for encoding/decoding byte-array to String. Either one is fine, but all code should use the same encoding/decoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, updated with org.apache.commons.codec.binary.Hex api.
Codecov Report
@@ Coverage Diff @@
## master #4041 +/- ##
===========================================
+ Coverage 67.24% 67.3% +0.06%
Complexity 20 20
===========================================
Files 1033 1033
Lines 51338 51366 +28
Branches 7181 7191 +10
===========================================
+ Hits 34524 34574 +50
+ Misses 14432 14401 -31
- Partials 2382 2391 +9
Continue to review full report at Codecov.
|
807e15c
to
79305e8
Compare
can we add a test case? |
Modified existing dictionary test to let BYTES dictionary index values contain half |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added minor comments, please address them before committing.
@@ -47,13 +51,24 @@ | |||
public BytesOffHeapMutableDictionary(int estimatedCardinality, int maxOverflowHashSize, | |||
PinotDataBufferMemoryManager memoryManager, String allocationContext, int avgLength) { | |||
super(estimatedCardinality, maxOverflowHashSize, memoryManager, allocationContext); | |||
_byteStore = new MutableOffHeapByteArrayStore(memoryManager, allocationContext, estimatedCardinality, avgLength); | |||
_byteStore = new MutableOffHeapByteArrayStore(memoryManager, allocationContext, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: This seems like a codestyle diff? Please ensure to use the recommend code style. https://pinot.readthedocs.io/en/latest/dev_env.html#intellij
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
byte[] bytes = (byte[]) rawValue; | ||
byte[] bytes = null; | ||
// Convert hex string to byte[]. | ||
if (rawValue instanceof String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the common case be of String of byte[]? Common case should be the first 'if' condition (same in other api's as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common case is still byte[], just wanna shorten code to still keep assert for other data types, so put String type first. I will change the if logic below to move the common case up.
if (rawValue instanceof byte[]) {
...
} if (rawValue instanceof String) {
...
} else {
assert rawValue instanceof byte[];
}
Can you add some motivation behind the change? I am guessing (but not sure) that we want to ingest string data that are known to be hexbytes. Is that right? Is this a part of a bigger design document? If so, can you point me to one (or, add one under https://cwiki.apache.org/confluence/display/PINOT/Design+Documents) |
My major motivation is related to this issue: #4040. |
Is it not possible to use a string datatype for this? Assuming that the string is byte-array encoded does not seem right to me. Perhaps I am missing something here |
eac1351
to
6f34c39
Compare
This is to match the data in select query. Basically in |
…e#4041) * Use hex string as the representation of byte array for queries * Adding test for indexing hexstring * Address comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fx19880617 can you please add a query example in https://pinot.readthedocs.io/en/latest/pql_examples.html
Use hex string to represent byte[] value in query.
Refer to #4040