diff --git a/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/predicate/operator/comparison/In.java b/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/predicate/operator/comparison/In.java index 104905a03a9ec..f596d589cdde2 100644 --- a/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/predicate/operator/comparison/In.java +++ b/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/predicate/operator/comparison/In.java @@ -56,23 +56,39 @@ * The {@code IN} operator. *

* This function has quite "unique" null handling rules around {@code null} and multivalued - * fields. Let's use an example: {@code WHERE x IN (a, b, c)}. Here's the per-row - * decision sequence: + * fields. The {@code null} rules are inspired by PostgreSQL, and, presumably, every other + * SQL implementation. The multivalue rules are pretty much an extension of the "multivalued + * fields are like null in scalars" rule. Here's some examples: + *

+ * + *

+ * And here's the decision tree for {@code WHERE x IN (a, b, c)}: *

*
    - *
  1. {@code x IS NULL} => {@code null}
  2. - *
  3. {@code a IS NULL AND b IS NULL AND c IS NULL} => {@code null}
  4. - *
  5. {@code MV_COUNT(a) > 1 AND MV_COUNT(a) > 1 AND MV_COUNT(a) > 1} => {@code null}
  6. - *
  7. {@code x == a OR x == b OR x == c} => {@code true}
  8. - *
  9. {@code a IS NULL OR b IS NULL OR c IS NULL} => {@code null}
  10. + *
  11. {@code x IS NULL} => return {@code null}
  12. + *
  13. {@code MV_COUNT(x) > 1} => emit a warning and return {@code null}
  14. + *
  15. {@code a IS NULL AND b IS NULL AND c IS NULL} => return {@code null}
  16. + *
  17. {@code MV_COUNT(a) > 1 OR MV_COUNT(b) > 1 OR MV_COUNT(c) > 1} => emit a warning and continue
  18. + *
  19. {@code MV_COUNT(a) > 1 AND MV_COUNT(b) > 1 AND MV_COUNT(c) > 1} => return {@code null}
  20. + *
  21. {@code x == a OR x == b OR x == c} => return {@code true}
  22. + *
  23. {@code a IS NULL OR b IS NULL OR c IS NULL} => return {@code null}
  24. *
  25. {@code else} => {@code false}
  26. *
*

- * I believe the first, second, and third entries are *mostly* optimizations and making the + * I believe the first five entries are *mostly* optimizations and making the * Three-valued logic of SQL - * explicit and/or work with the evaluators. You could probably shorten this to the last - * three points, but lots of folks aren't familiar with SQL's three-valued logic anyway, so - * let's be explicit. + * explicit and integrated with our multivalue field rules. And make all that work with the + * actual evaluator code. You could probably shorten this to the last three points, but lots + * of folks aren't familiar with SQL's three-valued logic anyway, so let's be explicit. *

*

* Because of this chain of logic we don't use the standard evaluator generators. They'd just