Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for VERSION field type in SQL and EQL #85502

Conversation

luigidellaquila
Copy link
Contributor

Fixes #83375

@luigidellaquila luigidellaquila added >enhancement :Analytics/SQL SQL querying :Analytics/EQL EQL querying Team:QL (Deprecated) Meta label for query languages team labels Mar 30, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Hi @luigidellaquila, I've created a changelog YAML for you.

@luigidellaquila luigidellaquila marked this pull request as draft March 30, 2022 16:04
@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, LGTM!

/**
* Does the provided {@code version} support the version type (PR#85502)?
*/
public static boolean supportsVersionFieldType(Version version) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supportsVersionType()?

@@ -160,6 +163,26 @@ public void testUnsignedLongFiltering() {
}
}

public void testVersionFieldFiltering() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testVersionTypeFiltering()?

@@ -80,7 +80,12 @@ public static ScriptTemplate nullSafeFilter(ScriptTemplate script) {
}

public static ScriptTemplate nullSafeSort(ScriptTemplate script) {
String methodName = script.outputType().isNumeric() ? "nullSafeSortNumeric" : "nullSafeSortString";
String methodName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iis this change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a leftover, fixed

@@ -32,6 +32,7 @@ public enum EsType implements SQLType {
TIME(Types.TIME),
DATETIME(Types.TIMESTAMP),
IP(Types.VARCHAR),
VERSION(Types.VARCHAR),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not appending into this enum might raise some bwc concerns (see #65145 (comment))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed

@@ -133,6 +134,9 @@ protected Object unwrapCustomValue(Object values) {
// since its later processing will be type dependent. (ex.: negation of UL is only "safe" for 0 values)
return convert(values, UNSIGNED_LONG);
}
if (dataType == VERSION && values instanceof String) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type check is safe, but wondering if necessary, since the Version converter will eventually do the same check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 also seems redundant to me

// some higher versions
for (int i = 0; i < randomInt(10); i++) {
index("test", "" + (docId++), builder -> {
String versionVal = (2 + randomInt(50)) + "." + randomInt(50) + "." + randomInt(50);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if there's any reason not to allow a major of 0 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing very specific, I just needed a random dataset that did not collide with my WHERE conditions.
Looking at it now, probably it's a bit overkill to test a result set, but still it validates that the query returns correct results in a fairly randomized environment.

Comment on lines 2262 to 2270
// bad version value
query = "SELECT name, version from test where version = 'foo'";
doWithQuery(query, results -> {
results.next();
assertEquals("version foo", results.getString("name"));
assertEquals("foo", results.getString("version"));
assertFalse(results.next());
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this second test add anything to the first one? JDBC has no Version type knowledge so it can only convert to string, so the content doesn't really matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is this distinction between valid and invalid versions, I thought it would be good to have both in the result set test. Anyway, I agree that knowing the implementation details this test is probably not crucial. I'll refactor it a bit to make it shorter.

@@ -349,6 +351,45 @@ public void testUnsignedLongVersionCompatibility() {
}
}

public void testVersionFieldVersionCompatibility() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testVersionTypeVersionCompatibilty()? SELECT '1.2.3'::version'... makes use of no field to test.

@@ -170,6 +179,30 @@ public void testUnsignedLongFiltering() {
}
}

public void testVersionFieldFiltering() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testVersionTypeFiltering()?

@@ -44,6 +44,8 @@ public final class DataTypes {
public static final DataType DATETIME = new DataType("DATETIME", "date", Long.BYTES, false, false, true);
// ip
public static final DataType IP = new DataType("ip", 45, false, false, true);
// version
public static final DataType VERSION = new DataType("version", Integer.MAX_VALUE, false, false, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving VERSION a size of Integer.MAX_VALUE seems wrong, though I guess that's the theoretical max. Not sure if ES type enforces any limit? It seems there's only a recommendation, but no actual limit: https://semver.org/#does-semver-have-a-size-limit-on-the-version-string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, any string can be indexed as a Version, so the limit is the same as a KEYWORD.

Copy link
Contributor

@Luegg Luegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

[[queries]]
name = "sequenceWithVersionConcat"
query = '''
sequence by transID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a version also be the join key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, adding a specific test case

return compare(lN, rN);
}

// automatic conversion for versions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still needed? Shouldn't by now the runtime type always be Version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how tolerant we want to be with automatic casts in situations like local folding (e.g WHERE '1.2.0' < '1.11.0'::version) and function evaluation (eg. IIF(version > '1.1', 1, 0)).

In general, we do not do it for local folding (eg. for numbers '2' > 1 returns false) but we do it for field queries (eg. id > '3' will evaluate to true even if id is numeric, same with version > 1.2).

IMHO this is an inconsistency: the same operation should have the same behavior, locally and in _search.
Since local folding is just an optimization or a fallback in most of the cases, so I tend to consider the automatic cast as the expected behavior.

So from my point of view we should leave it as it is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would tend to not make an exception for Version. Since we have this distinction between local execution and queries it should at least be consistent across types.

@@ -133,6 +134,9 @@ protected Object unwrapCustomValue(Object values) {
// since its later processing will be type dependent. (ex.: negation of UL is only "safe" for 0 values)
return convert(values, UNSIGNED_LONG);
}
if (dataType == VERSION && values instanceof String) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 also seems redundant to me

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

@@ -35,7 +35,7 @@
* lexically in ASCII sort order. Numeric identifiers always have lower precedence than non-numeric identifiers.
* </ul>
*/
class VersionEncoder {
public class VersionEncoder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the code touches on code outside QL, please find one of the authors/team responsible for the code to review these changes.
If only constants are being used, it's fine to copy them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, actually these changes are a leftover from the previous attempts to support Version without impacting on the Search/Painless implementation, but since we went with a more complete solution (see #85990), we don't need them anymore.

@@ -62,6 +64,15 @@ public String esType() {
return esType;
}

public ScriptSortBuilder.ScriptSortType scriptSortType() {
if (isNumeric()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return isNumeric() ? ScriptSortType.NUMBER : this == DataTypes.VERSION ? ScriptSortType.VERSION : ScriptSortType.STRING

@@ -274,7 +274,9 @@ private static XContentBuilder toXContent(ColumnInfo info, XContentBuilder build
* Serializes the provided value in SQL-compatible way based on the client mode
*/
public static XContentBuilder value(XContentBuilder builder, Mode mode, SqlVersion sqlVersion, Object value) throws IOException {
if (value instanceof ZonedDateTime zdt) {
if (value instanceof org.elasticsearch.xpack.versionfield.Version) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version is not a widely used type hence why it should be the else if not at the main if.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes regarding the impact on mapper-version project.

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing the feedback. LGTM!

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

@luigidellaquila luigidellaquila merged commit 79b0078 into elastic:master Jun 9, 2022
luigidellaquila added a commit that referenced this pull request Jun 9, 2022
@luigidellaquila
Copy link
Contributor Author

Had to revert due to the following:

va.lang.IllegalArgumentException: can not write type [class org.elasticsearch.xpack.versionfield.Version]
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.StreamOutput.writeGenericValue(StreamOutput.java:828)
        at org.elasticsearch.xpack.ql.expression.gen.processor.ConstantProcessor.writeTo(ConstantProcessor.java:68)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.StreamOutput.writeNamedWriteable(StreamOutput.java:1097)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.StreamOutput.writeNamedWriteableList(StreamOutput.java:1203)
        at org.elasticsearch.xpack.sql.expression.predicate.conditional.ConditionalProcessor.writeTo(ConditionalProcessor.java:73)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.StreamOutput.writeNamedWriteable(StreamOutput.java:1097)
        at org.elasticsearch.xpack.ql.execution.search.extractor.ComputingExtractor.writeTo(ComputingExtractor.java:56)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.StreamOutput.writeNamedWriteable(StreamOutput.java:1097)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.StreamOutput.writeNamedWriteableList(StreamOutput.java:1203)
        at org.elasticsearch.xpack.sql.execution.search.SearchHitCursor.writeTo(SearchHitCursor.java:80)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.StreamOutput.writeNamedWriteable(StreamOutput.java:1097)
        at org.elasticsearch.xpack.sql.session.Cursors.encodeToString(Cursors.java:82)
        at org.elasticsearch.xpack.sql.session.Cursors.encodeToString(Cursors.java:73)
        at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.createResponse(TransportSqlQueryAction.java:225)

The problem seems to be related to how Version objects are translated to strings to be serialized, so it should be deterministic, but strangely it's not (before merging the build was green and I didn't notice this error before during the tests)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/EQL EQL querying :Analytics/SQL SQL querying >enhancement Team:QL (Deprecated) Meta label for query languages team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EQL support for fields of type version
9 participants