Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-31101][BUILD][3.0] Upgrade Janino to 3.0.16
### What changes were proposed in this pull request? This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently. * Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate". Please see the commit log. - https://github.com/janino-compiler/janino/commits/3.0.16 You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect. ### Why are the changes needed? We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details. Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Below test code fails on branch-3.0 and passes with this patch. ``` /** * NOTE: The test code tries to control the size of for/switch statement in expand_doConsume, * as well as the overall size of expand_doConsume, so that the query triggers known Janino * bug - janino-compiler/janino#113. * * The expected exception message from Janino when we use switch statement for "ExpandExec": * - "Operand stack inconsistent at offset xxx: Previous size 1, now 0" * which will not happen when we use if-else-if statement for "ExpandExec". * * "The number of fields" and "The number of distinct aggregation functions" are the major * factors to increase the size of generated code: while these values should be large enough * to trigger the Janino bug, these values should not also too big; otherwise one of below * exceptions might be thrown: * - "expand_doConsume would be beyond 64KB" * - "java.lang.ClassFormatError: Too many arguments in method signature in class file" */ test("SPARK-31115 Lots of columns and distinct aggregations shouldn't break code generation") { withSQLConf( (SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true"), (SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key, "10000"), (SQLConf.CODEGEN_FALLBACK.key, "false"), (SQLConf.CODEGEN_LOGGING_MAX_LINES.key, "-1") ) { var df = Seq(("1", "2", 1), ("1", "2", 2), ("2", "3", 3), ("2", "3", 4)).toDF("a", "b", "c") // The value is tested under commit "e807118eef9e0214170ff62c828524d237bd58e3": // the query fails with switch statement, whereas it passes with if-else statement. // Note that the value depends on the Spark logic as well - different Spark versions may // require different value to ensure the test failing with switch statement. val numNewFields = 100 df = df.withColumns( (1 to numNewFields).map { idx => s"a$idx" }, (1 to numNewFields).map { idx => when(col("c").mod(lit(2)).===(lit(0)), lit(idx)).otherwise(col("c")) } ) val aggExprs: Array[Column] = Range(1, numNewFields).map { idx => if (idx % 2 == 0) { coalesce(countDistinct(s"a$idx"), lit(0)) } else { coalesce(count(s"a$idx"), lit(0)) } }.toArray val aggDf = df .groupBy("a", "b") .agg(aggExprs.head, aggExprs.tail: _*) // We are only interested in whether the code compilation fails or not, so skipping // verification on outputs. aggDf.collect() } } ``` Closes #27996 from HeartSaVioR/SPARK-31101-branch-3.0. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
- Loading branch information