Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-6067] Open spark 35 ut #8555

Merged
merged 4 commits into from
Feb 12, 2025
Merged

Conversation

baibaichen
Copy link
Contributor

@baibaichen baibaichen commented Jan 17, 2025

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

(Fixes: #6067)

How was this patch tested?

image

@github-actions github-actions bot added the CORE works for Gluten Core label Jan 17, 2025
Copy link

#6067

Copy link

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Feb 1, 2025

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

github-actions bot commented Feb 1, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Feb 3, 2025

Run Gluten Clickhouse CI on x86

2 similar comments
Copy link

github-actions bot commented Feb 3, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Feb 3, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Feb 4, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Feb 5, 2025

Run Gluten Clickhouse CI on x86

@baibaichen
Copy link
Contributor Author

@CodiumAI-Agent /review

@CodiumAI-Agent
Copy link

CodiumAI-Agent commented Feb 5, 2025

PR Reviewer Guide 🔍

(Review updated until commit 590892c)

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

6067 - Partially compliant

Compliant requirements:

  • Upgrade Spark version to 3.5 and ensure compilation passes.
  • Upgrade Delta version to 3.2 and ensure compilation passes.
  • Support Scala 2.13.

Non-compliant requirements:

  • Support native write.
  • Ensure CH backend unit tests pass (MergeTree + Delta UT passed).
  • Address specific test failures and fallback scenarios.

Requires further human verification:

  • Support native write.
  • Ensure CH backend unit tests pass (MergeTree + Delta UT passed).
  • Address specific test failures and fallback scenarios.
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Test Coverage

Ensure that the added SQL query test lists (SUPPORTED_SQL_QUERY_LIST, OVERWRITE_SQL_QUERY_LIST, IGNORE_SQL_QUERY_LIST) are comprehensive and correctly categorized for the CH backend.

val SUPPORTED_SQL_QUERY_LIST: Set[String] = Set(
  "bitwise.sql",
  "cast.sql",
  "change-column.sql",
  // CH- "charvarchar.sql",
  "columnresolution-negative.sql",
  "columnresolution-views.sql",
  "columnresolution.sql",
  "comments.sql",
  "comparator.sql",
  "count.sql",
  "cross-join.sql",
  "csv-functions.sql",
  // CH- "cte-legacy.sql",
  "cte-nested.sql",
  // CH- "cte-nonlegacy.sql",
  // CH- "cte.sql",
  "current_database_catalog.sql",
  "date.sql",
  "datetime-formatting-invalid.sql",
  // Velox had different handling for some illegal cases.
  // "datetime-formatting-legacy.sql",
  // "datetime-formatting.sql",
  "datetime-legacy.sql",
  "datetime-parsing-invalid.sql",
  "datetime-parsing-legacy.sql",
  "datetime-parsing.sql",
  "datetime-special.sql",
  // CH - "decimalArithmeticOperations.sql",
  // "describe-part-after-analyze.sql",
  "describe-query.sql",
  "describe-table-after-alter-table.sql",
  "describe-table-column.sql",
  "describe.sql",
  "except-all.sql",
  "except.sql",
  "extract.sql",
  "group-by-filter.sql",
  // CH - "group-by-ordinal.sql",
  "grouping_set.sql",
  "having.sql",
  "ignored.sql",
  "ilike-all.sql",
  "ilike-any.sql",
  "inline-table.sql",
  "inner-join.sql",
  "intersect-all.sql",
  "interval.sql",
  "join-empty-relation.sql",
  // CH - "join-lateral.sql",
  // CH - "json-functions.sql",
  "like-all.sql",
  "like-any.sql",
  "limit.sql",
  "literals.sql",
  "map.sql",
  // CH- "misc-functions.sql",
  "natural-join.sql",
  "null-handling.sql",
  // CH- "null-propagation.sql",
  "operators.sql",
  "order-by-nulls-ordering.sql",
  "order-by-ordinal.sql",
  "outer-join.sql",
  "parse-schema-string.sql",
  "pivot.sql",
  "pred-pushdown.sql",
  "predicate-functions.sql",
  "query_regex_column.sql",
  // CH- "random.sql",
  // CH - "regexp-functions.sql",
  "show-create-table.sql",
  "show-tables.sql",
  "show-tblproperties.sql",
  "show-views.sql",
  "show_columns.sql",
  "sql-compatibility-functions.sql",
  "string-functions.sql",
  "struct.sql",
  // CH - "subexp-elimination.sql",
  "table-aliases.sql",
  // CH -"table-valued-functions.sql",
  "tablesample-negative.sql",
  "subquery/exists-subquery/exists-aggregate.sql",
  "subquery/exists-subquery/exists-basic.sql",
  "subquery/exists-subquery/exists-cte.sql",
  "subquery/exists-subquery/exists-having.sql",
  "subquery/exists-subquery/exists-joins-and-set-ops.sql",
  "subquery/exists-subquery/exists-orderby-limit.sql",
  "subquery/exists-subquery/exists-within-and-or.sql",
  "subquery/in-subquery/in-basic.sql",
  "subquery/in-subquery/in-group-by.sql",
  "subquery/in-subquery/in-having.sql",
  "subquery/in-subquery/in-joins.sql",
  "subquery/in-subquery/in-limit.sql",
  "subquery/in-subquery/in-multiple-columns.sql",
  "subquery/in-subquery/in-order-by.sql",
  // CH- "subquery/in-subquery/in-set-operations.sql",
  "subquery/in-subquery/in-with-cte.sql",
  "subquery/in-subquery/nested-not-in.sql",
  "subquery/in-subquery/not-in-group-by.sql",
  "subquery/in-subquery/not-in-joins.sql",
  "subquery/in-subquery/not-in-unit-tests-multi-column.sql",
  "subquery/in-subquery/not-in-unit-tests-multi-column-literal.sql",
  "subquery/in-subquery/not-in-unit-tests-single-column.sql",
  "subquery/in-subquery/not-in-unit-tests-single-column-literal.sql",
  "subquery/in-subquery/simple-in.sql",
  // CH -"subquery/negative-cases/invalid-correlation.sql",
  "subquery/negative-cases/subq-input-typecheck.sql",
  "subquery/scalar-subquery/scalar-subquery-predicate.sql",
  "subquery/scalar-subquery/scalar-subquery-select.sql",
  "subquery/subquery-in-from.sql",
  "postgreSQL/aggregates_part1.sql",
  "postgreSQL/aggregates_part2.sql",
  "postgreSQL/aggregates_part3.sql",
  "postgreSQL/aggregates_part4.sql",
  "postgreSQL/boolean.sql",
  "postgreSQL/case.sql",
  "postgreSQL/comments.sql",
  "postgreSQL/create_view.sql",
  "postgreSQL/date.sql",
  "postgreSQL/float4.sql",
  "postgreSQL/insert.sql",
  "postgreSQL/int2.sql",
  "postgreSQL/int4.sql",
  "postgreSQL/int8.sql",
  "postgreSQL/interval.sql",
  "postgreSQL/join.sql",
  "postgreSQL/limit.sql",
  "postgreSQL/numeric.sql",
  "postgreSQL/select.sql",
  "postgreSQL/select_distinct.sql",
  "postgreSQL/select_having.sql",
  "postgreSQL/select_implicit.sql",
  "postgreSQL/strings.sql",
  "postgreSQL/text.sql",
  "postgreSQL/timestamp.sql",
  "postgreSQL/union.sql",
  "postgreSQL/window_part1.sql",
  "postgreSQL/window_part2.sql",
  "postgreSQL/window_part3.sql",
  "postgreSQL/window_part4.sql",
  "postgreSQL/with.sql",
  "datetime-special.sql",
  "timestamp-ansi.sql",
  "timestamp.sql",
  "arrayJoin.sql",
  "binaryComparison.sql",
  "booleanEquality.sql",
  "caseWhenCoercion.sql",
  "concat.sql",
  "dateTimeOperations.sql",
  "decimalPrecision.sql",
  "division.sql",
  "elt.sql",
  "ifCoercion.sql",
  "implicitTypeCasts.sql",
  "inConversion.sql",
  "mapZipWith.sql",
  "promoteStrings.sql",
  "stringCastAndExpressions.sql",
  "widenSetOperationTypes.sql",
  "windowFrameCoercion.sql",
  "timestamp-ltz.sql",
  "timestamp-ntz.sql",
  "timezone.sql",
  // CH- "transform.sql",
  "try_arithmetic.sql",
  "try_cast.sql",
  "udaf.sql",
  "union.sql",
  "using-join.sql",
  "window.sql",
  "udf-union.sql",
  "udf-window.sql",
  "ansi/cast.sql",
  "ansi/decimalArithmeticOperations.sql",
  "ansi/map.sql",
  "ansi/datetime-parsing-invalid.sql",
  "ansi/string-functions.sql",
  "ansi/interval.sql",
  "ansi/date.sql",
  "ansi/timestamp.sql",
  "ansi/try_arithmetic.sql",
  "ansi/literals.sql",
  "timestampNTZ/timestamp-ansi.sql",
  "timestampNTZ/timestamp.sql",
  "udf/udf-intersect-all.sql - Scala UDF",
  "udf/udf-except-all.sql - Scala UDF",
  "udf/udf-udaf.sql - Scala UDF",
  "udf/udf-except.sql - Scala UDF",
  "udf/udf-pivot.sql - Scala UDF",
  "udf/udf-inline-table.sql - Scala UDF",
  "udf/postgreSQL/udf-select_having.sql - Scala UDF",
  "typeCoercion/native/decimalPrecision.sql",
  "typeCoercion/native/ifCoercion.sql",
  "typeCoercion/native/dateTimeOperations.sql",
  "typeCoercion/native/booleanEquality.sql",
  "typeCoercion/native/mapZipWith.sql",
  "typeCoercion/native/caseWhenCoercion.sql",
  "typeCoercion/native/widenSetOperationTypes.sql",
  "typeCoercion/native/stringCastAndExpressions.sql",
  "typeCoercion/native/inConversion.sql",
  "typeCoercion/native/division.sql",
  "typeCoercion/native/mapconcat.sql"
)

val OVERWRITE_SQL_QUERY_LIST: Set[String] = Set(
  // The calculation formulas for corr, skewness, kurtosis, variance, and stddev in Velox differ
  // slightly from those in Spark, resulting in some differences in the final results.
  // Overwrite below test cases.
  // -- SPARK-24369 multiple distinct aggregations having the same argument set
  // -- Aggregate with nulls.
  "group-by.sql",
  "udf/udf-group-by.sql"
  // Overwrite some results of regr_intercept, regr_r2, corr.
  // CH - "linear-regression.sql"
)

val IGNORE_SQL_QUERY_LIST: List[String] = List(
  "udf/udf-count.sql - Regular Python UDF",
  "udf/udf-except.sql - Regular Python UDF",
  "udf/udf-except-all.sql - Regular Python UDF",
  "udf/udf-natural-join.sql - Regular Python UDF",
  "udf/udf-outer-join.sql - Regular Python UDF",
  "udf/udf-pivot.sql - Regular Python UDF",
  "udf/udf-intersect-all.sql - Regular Python UDF",
  "udf/udf-union.sql - Regular Python UDF",
  "udf/udf-having.sql - Regular Python UDF",
  "udf/udf-group-analytics.sql - Regular Python UDF",
  "udf/udf-group-by.sql - Regular Python UDF",
  // CH excludes following
  "typeCoercion/native/windowFrameCoercion.sql",
  "typeCoercion/native/promoteStrings.sql",
  "typeCoercion/native/concat.sql"
)
Ignored Tests

Verify that the ignored tests added to getIgnoredSQLQueryTests are necessary and do not omit critical functionality.

) ++ otherIgnoreList ++ udafIgnoreList ++
  BackendTestSettings.instance.getSQLQueryTestSettings.getIgnoredSQLQueryTests
Backend Test Settings

Confirm that the new methods includeCH and excludeCH are correctly implemented and integrated with the test suite.

def includeCH(testNames: String*): SuiteSettings = {
  this
}
def excludeCH(testNames: String*): SuiteSettings = {
  exclude(testNames: _*)
  this
}

Copy link

Run Gluten Clickhouse CI on x86

@CodiumAI-Agent
Copy link

Persistent review updated to latest commit 590892c

@baibaichen baibaichen marked this pull request as ready for review February 11, 2025 07:08
@baibaichen baibaichen changed the title [GLUTEN-6067] [WIP] open spark 35 ut [GLUTEN-6067] Open spark 35 ut Feb 11, 2025
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Comment on lines +101 to +103
def includeCH(testNames: String*): SuiteSettings = {
this
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,it's used to indicate this case is supported by ch backend

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,it's used to indicate this case is supported by ch backend

Copy link
Contributor

@lgbo-ustc lgbo-ustc Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just consider the method names are not general for backends, They are only used in CH test settings. How about this ?

def includeBackend(backendName: String, testNames: String*)
def excludeBackend(backendName: String, testNames: String*)

@baibaichen baibaichen merged commit 4bb28f2 into apache:main Feb 12, 2025
51 checks passed
@baibaichen baibaichen deleted the feature/spark35ut branch February 12, 2025 02:49
@baibaichen
Copy link
Contributor Author

we now prepare to enable spark35 and disable spark 32
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLICKHOUSE CORE works for Gluten Core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Support CH backend with Spark 3.5.x
3 participants