-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce 64-bit unsigned long field type #60050
Introduce 64-bit unsigned long field type #60050
Conversation
Pinging @elastic/es-search (:Search/Mapping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorting and aggregations is based on conversion of long values
to double and can be imprecise for large values.
I did not check the pr yet but I wonder why sorting cannot be done on the unsigned value, We can use Long#compareUnsigned
in a custom comparator ?
@jimczi Thanks, Jim, I will look into this. |
8861d6d
to
c276ffd
Compare
This field type supports - indexing of integer values from [0, 18446744073709551615] - precise queries (term, range) - sorting and aggregations is based on conversion of long values to double and can be imprecise for large values. Closes elastic#32434
c276ffd
to
dffd748
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start. I agree with Jim's comment that it would be great if sorting was accurate, like ranges.
libs/x-content/src/main/java/org/elasticsearch/common/xcontent/json/JsonXContentParser.java
Outdated
Show resolved
Hide resolved
...nsigned-long/src/main/java/org/elasticsearch/xpack/unsignedlong/UnsignedLongFieldMapper.java
Outdated
Show resolved
Hide resolved
678b883
to
3b1411b
Compare
3b1411b
to
ada3422
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks great @mayya-sharipova! I left some comments but nothing major.
server/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java
Outdated
Show resolved
Hide resolved
...ned-long/src/main/java/org/elasticsearch/xpack/unsignedlong/UnsignedLongScriptDocValues.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed some small points while getting up to speed with the change !
...nsigned-long/src/main/java/org/elasticsearch/xpack/unsignedlong/UnsignedLongFieldMapper.java
Show resolved
Hide resolved
- Convert UnsignedLongFieldMapper to a parametrized form - Small adjustments in UnsignedLongScriptDocValues
Collapse was not working on unsigned_long field, as collapsing was enabled only on KeywordFieldType and NumberFieldType. This introduces a new method `collapseType` to MappedFieldType, that is checked to decide if collapsing should be enabled. Relates to elastic#60050
Collapse was not working on unsigned_long field, as collapsing was enabled only on KeywordFieldType and NumberFieldType. This introduces a new method `collapseType` to MappedFieldType, that is checked to decide if collapsing should be enabled. Relates to #60050
UnsignedLongTests for the range agg was using very specific intervals that double type can not distinguish due to lack of precision: 9.223372036854776000E18 == 9.223372036854775807E18 returns true If we add the corresponding range query test, it will return different number of hits than the range agg, as range query unlike range agg doesn't convert valued to double type, and hence more precise. This patch make broader ranges for the range agg test (so values converted to doubles don't loose precision), and hence corresponding range query will return the same number of hits. Relates to #60050
UnsignedLongTests for the range agg was using very specific intervals that double type can not distinguish due to lack of precision: 9.223372036854776000E18 == 9.223372036854775807E18 returns true If we add the corresponding range query test, it will return different number of hits than the range agg, as range query unlike range agg doesn't convert valued to double type, and hence more precise. This patch make broader ranges for the range agg test (so values converted to doubles don't loose precision), and hence corresponding range query will return the same number of hits. Relates to #60050
Max and min aggs were producing wrong results for unsigned_long field if field was indexed. If field is indexed for max/min aggs instead of field data, we use values from indexed Points, values of which are derived using method pointReaderIfPossible. Before UnsignedLongFieldType#pointReaderIfPossible was incorrectly producing values, as it failed to shift them back to original values. This patch fixes method pointReaderIfPossible to produce correct original values. Relates to elastic#60050
Max and min aggs were producing wrong results for unsigned_long field if field was indexed. If field is indexed for max/min aggs instead of field data, we use values from indexed Points, values of which are derived using method pointReaderIfPossible. Before UnsignedLongFieldType#pointReaderIfPossible was incorrectly producing values, as it failed to shift them back to original values. This patch fixes method pointReaderIfPossible to produce correct original values. Relates to #60050
Max and min aggs were producing wrong results for unsigned_long field if field was indexed. If field is indexed for max/min aggs instead of field data, we use values from indexed Points, values of which are derived using method pointReaderIfPossible. Before UnsignedLongFieldType#pointReaderIfPossible was incorrectly producing values, as it failed to shift them back to original values. This patch fixes method pointReaderIfPossible to produce correct original values. Relates to #60050
Max and min aggs were producing wrong results for unsigned_long field if field was indexed. If field is indexed for max/min aggs instead of field data, we use values from indexed Points, values of which are derived using method pointReaderIfPossible. Before UnsignedLongFieldType#pointReaderIfPossible was incorrectly producing values, as it failed to shift them back to original values. This patch fixes method pointReaderIfPossible to produce correct original values. Relates to #60050
Max and min aggs were producing wrong results for unsigned_long field if field was indexed. If field is indexed for max/min aggs instead of field data, we use values from indexed Points, values of which are derived using method pointReaderIfPossible. Before UnsignedLongFieldType#pointReaderIfPossible was incorrectly producing values, as it failed to shift them back to original values. This patch fixes method pointReaderIfPossible to produce correct original values. Relates to elastic#60050
Adds support for the unsigned_long type to data frame analytics. This type is handled in the same way as the long type. Values sent to the ML native processes are converted to floats and hence will lose accuracy when outside the range where a float can uniquely represent long values. Relates elastic#60050
Adds support for the unsigned_long type to data frame analytics. This type is handled in the same way as the long type. Values sent to the ML native processes are converted to floats and hence will lose accuracy when outside the range where a float can uniquely represent long values. Relates #60050
This introduces the UNSIGNED_LONG type to QL following its availability in ES (#60050). The type is mapped to a BigInteger whose value is checked against the UL bounds. The SQL will now support the type as literal and in the arithmetic functions; the non-arithmetic functions however are unchanged (i.e. they still require a long / int parameter where that is the case). The type is version-gated: for the driver SQL clients (only) the server checks their version and in case this is lower than the one introducing the UL support, it fails the request, for queries, or simply hidden in catalog functions (similar to how UNSUPPORTED is currently treated in the similar case) The JDBC tests are adjusted to read the (bwc) version of the driver they are run against and selectively disable part of the tests accordingly. Closes #63312
This field type supports
Sort values, term aggs keys, script values, doc values fields,
source fields return long (for values <=2^63-1)
and BigInteger (for values > 2^63-1).
Closes #32434