Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31101][BUILD][2.4] Upgrade Janino to 3.0.16 #27997

Closed

Conversation

HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Mar 24, 2020

What changes were proposed in this pull request?

This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently.

Please see the commit log.

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect.

Why are the changes needed?

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs.

Below test code fails on branch-2.4 and passes with this patch.

(Note that there seems to be the case where another UT affects this UT to not fail - adding this to SQLQuerySuite won't fail this UT, but adding this to DateFunctionsSuite will fail this UT, and if you run this UT solely in SQLQuerySuite via build/sbt "sql/testOnly *.SQLQuerySuite -- -z SPARK-31115" then it fails.)

  /**
   * NOTE: The test code tries to control the size of for/switch statement in expand_doConsume,
   * as well as the overall size of expand_doConsume, so that the query triggers known Janino
   * bug - https://github.com/janino-compiler/janino/issues/113.
   *
   * The expected exception message from Janino when we use switch statement for "ExpandExec":
   * - "Operand stack inconsistent at offset xxx: Previous size 1, now 0"
   * which will not happen when we use if-else-if statement for "ExpandExec".
   *
   * "The number of fields" and "The number of distinct aggregation functions" are the major
   * factors to increase the size of generated code: while these values should be large enough
   * to trigger the Janino bug, these values should not also too big; otherwise one of below
   * exceptions might be thrown:
   * - "expand_doConsume would be beyond 64KB"
   * - "java.lang.ClassFormatError: Too many arguments in method signature in class file"
   */
  test("SPARK-31115 Lots of columns and distinct aggregations shouldn't break code generation") {
    withSQLConf(
      (SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true"),
      (SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key, "10000"),
      (SQLConf.CODEGEN_FALLBACK.key, "false"),
      (SQLConf.CODEGEN_LOGGING_MAX_LINES.key, "-1")
    ) {
      var df = Seq(("1", "2", 1), ("1", "2", 2), ("2", "3", 3), ("2", "3", 4)).toDF("a", "b", "c")

      // The value is tested under commit "244405fe57d7737d81c34ba9e8917df6285889eb":
      // the query fails with switch statement, whereas it passes with if-else statement.
      // Note that the value depends on the Spark logic as well - different Spark versions may
      // require different value to ensure the test failing with switch statement.
      val numNewFields = 100

      df = df.withColumns(
        (1 to numNewFields).map { idx => s"a$idx" },
        (1 to numNewFields).map { idx =>
          when(col("c").mod(lit(2)).===(lit(0)), lit(idx)).otherwise(col("c"))
        }
      )

      val aggExprs: Array[Column] = Range(1, numNewFields).map { idx =>
        if (idx % 2 == 0) {
          coalesce(countDistinct(s"a$idx"), lit(0))
        } else {
          coalesce(count(s"a$idx"), lit(0))
        }
      }.toArray

      val aggDf = df
        .groupBy("a", "b")
        .agg(aggExprs.head, aggExprs.tail: _*)

      // We are only interested in whether the code compilation fails or not, so skipping
      // verification on outputs.
      aggDf.collect()
    }
  }

This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently.

* Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate".

Please see the commit log.
- https://github.com/janino-compiler/janino/commits/3.0.16

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect.

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly.

No.

Existing UTs.

Closes apache#27932 from HeartSaVioR/SPARK-31101-janino-3.0.16.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@SparkQA
Copy link

SparkQA commented Mar 24, 2020

Test build #120240 has finished for PR 27997 at commit d7ba301.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Mar 24, 2020

Test build #120249 has finished for PR 27997 at commit d7ba301.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hi, @HeartSaVioR .
Could you update the PR description, please?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 25, 2020

Also, cc @gatorsmile and @cloud-fan since the situation is different from master and branch-3.0.

- commons-compiler/3.0.9//commons-compiler-3.0.9.jar
+ commons-compiler/3.0.16//commons-compiler-3.0.16.jar

@HeartSaVioR
Copy link
Contributor Author

Just added the UT in the description of PR. Thanks!

@dongjoon-hyun
Copy link
Member

Thank you, @HeartSaVioR !

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 29, 2020

Hi, @HeartSaVioR . I cannot reproduce the failure with the given example and in 244405fe57d7737d81c34ba9e8917df6285889eb at branch-2.4. Could you help me verify this PR?

$ git checkout 244405fe57d7737d81c34ba9e8917df6285889eb
$ // Add the test case to SQLQuerySuite
$ build/sbt "sql/testOnly *.SQLQuerySuite"
...
[info] - SPARK-31115 Lots of columns and distinct aggregations shouldn't break code generation (1 second, 619 milliseconds)
...

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Mar 29, 2020

It seems odd - I've just played with it a bit more, and added my observation to PR description. Quoting here again:

Note that there seems to be the case where another UT affects this UT to not fail - adding this to SQLQuerySuite won't fail this UT, but adding this to DateFunctionsSuite will fail this UT, and if you run this UT solely in SQLQuerySuite via build/sbt "sql/testOnly *.SQLQuerySuite -- -z SPARK-31115" then it fails.

I don't think I should investigate to make it fail consistently - if that's the goal I would just add the UT to the code. The thing is that there's a case the query fails, and this PR fixes it.

@dongjoon-hyun
Copy link
Member

Thank you, @HeartSaVioR ! I reproduced it by build/sbt "sql/testOnly *.SQLQuerySuite -- -z SPARK-31115" as you wrote.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to branch-2.4.

dongjoon-hyun pushed a commit that referenced this pull request Mar 30, 2020
### What changes were proposed in this pull request?

This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently.

* Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate".

Please see the commit log.
- https://github.com/janino-compiler/janino/commits/3.0.16

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect.

### Why are the changes needed?

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing UTs.

Below test code fails on branch-2.4 and passes with this patch.

(Note that there seems to be the case where another UT affects this UT to not fail - adding this to SQLQuerySuite won't fail this UT, but adding this to DateFunctionsSuite will fail this UT, and if you run this UT solely in SQLQuerySuite via `build/sbt "sql/testOnly *.SQLQuerySuite -- -z SPARK-31115"` then it fails.)

```
  /**
   * NOTE: The test code tries to control the size of for/switch statement in expand_doConsume,
   * as well as the overall size of expand_doConsume, so that the query triggers known Janino
   * bug - janino-compiler/janino#113.
   *
   * The expected exception message from Janino when we use switch statement for "ExpandExec":
   * - "Operand stack inconsistent at offset xxx: Previous size 1, now 0"
   * which will not happen when we use if-else-if statement for "ExpandExec".
   *
   * "The number of fields" and "The number of distinct aggregation functions" are the major
   * factors to increase the size of generated code: while these values should be large enough
   * to trigger the Janino bug, these values should not also too big; otherwise one of below
   * exceptions might be thrown:
   * - "expand_doConsume would be beyond 64KB"
   * - "java.lang.ClassFormatError: Too many arguments in method signature in class file"
   */
  test("SPARK-31115 Lots of columns and distinct aggregations shouldn't break code generation") {
    withSQLConf(
      (SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true"),
      (SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key, "10000"),
      (SQLConf.CODEGEN_FALLBACK.key, "false"),
      (SQLConf.CODEGEN_LOGGING_MAX_LINES.key, "-1")
    ) {
      var df = Seq(("1", "2", 1), ("1", "2", 2), ("2", "3", 3), ("2", "3", 4)).toDF("a", "b", "c")

      // The value is tested under commit "244405fe57d7737d81c34ba9e8917df6285889eb":
      // the query fails with switch statement, whereas it passes with if-else statement.
      // Note that the value depends on the Spark logic as well - different Spark versions may
      // require different value to ensure the test failing with switch statement.
      val numNewFields = 100

      df = df.withColumns(
        (1 to numNewFields).map { idx => s"a$idx" },
        (1 to numNewFields).map { idx =>
          when(col("c").mod(lit(2)).===(lit(0)), lit(idx)).otherwise(col("c"))
        }
      )

      val aggExprs: Array[Column] = Range(1, numNewFields).map { idx =>
        if (idx % 2 == 0) {
          coalesce(countDistinct(s"a$idx"), lit(0))
        } else {
          coalesce(count(s"a$idx"), lit(0))
        }
      }.toArray

      val aggDf = df
        .groupBy("a", "b")
        .agg(aggExprs.head, aggExprs.tail: _*)

      // We are only interested in whether the code compilation fails or not, so skipping
      // verification on outputs.
      aggDf.collect()
    }
  }
```

Closes #27997 from HeartSaVioR/SPARK-31101-branch-2.4.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Thank you, @HeartSaVioR , @HyukjinKwon , @maropu !

@HeartSaVioR
Copy link
Contributor Author

Thanks as always for your details, @dongjoon-hyun ! Thanks all for reviewing and merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants