-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21708][BUILD] Migrate build to sbt 1.x #29286
Conversation
It would be nice if we can manage to upgrade! |
Yeah, it's really time to get there :) Right now first blocker is known issue with sbt-pom-reader not issuing test-scoped dependencies. I'll proceed with solving that (probably by submitting PR there). |
IIRC, the sbt-pom-reader is a fork. As a temporary solution, we might need to fix it there. Or, we should resolve [this issue] to don't use our fork. See also #27307 (comment) |
Yep, I had seen a reason to use forked one. I would prefer to fix it on upstream if possible. |
+1 this would be great to fix. I tried upgrading a long time ago and hit problems I didn't know how to solve. Hope you can! |
Jenkins test this please |
Test build #126988 has finished for PR 29286 at commit
|
@srowen it'd fail at this moment without a chance. |
I'm not sure that it'd be invoked, but let's see :) Jenkins test this please |
@gemelen Thanks for doing this work ! I saw that your other PR is already merged to sbt-pom-reader. Hoping that we get rid of that fork. If you need any help, feel free to ping me. |
Could please someone request Jenkins build for me? Regarding GH Action tests failing - I'll debug it in separate PR, since it sounds like some simple issue, but needs few temporary logging enhancements. |
Jenkins, retest this please. |
retest this please |
Test build #127854 has finished for PR 29286 at commit
|
I found out that GH Action fails because files in in .github/workflows/master.yml
@HyukjinKwon could you please advise how to proceed with that issue? |
retest this please |
Test build #127863 has finished for PR 29286 at commit
|
Ok, MiMa is to be upgraded too :) |
I made a PR to fix GitHub Actions at #29536. Thanks for investigation. |
I've applied changes for sbt-mima plugin upgrade and there are compatibility issues reported by its task. |
…tHub Actions ### What changes were proposed in this pull request? This PR proposes to explicitly cache and hash the files/directories under 'build' for SBT and Zinc at GitHub Actions. Otherwise, it can end up with overwriting `build` directory. See also #29286 (comment) Previously, other files like `build/mvn` and `build/sbt` are also cached and overwritten. So, when you have some changes there, they are ignored. ### Why are the changes needed? To make GitHub Actions build stable. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? The builds in this PR test it out. Closes #29536 from HyukjinKwon/SPARK-32695. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…tHub Actions ### What changes were proposed in this pull request? This PR proposes to explicitly cache and hash the files/directories under 'build' for SBT and Zinc at GitHub Actions. Otherwise, it can end up with overwriting `build` directory. See also #29286 (comment) Previously, other files like `build/mvn` and `build/sbt` are also cached and overwritten. So, when you have some changes there, they are ignored. ### Why are the changes needed? To make GitHub Actions build stable. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? The builds in this PR test it out. Closes #29536 from HyukjinKwon/SPARK-32695. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit b07e742) Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…tHub Actions ### What changes were proposed in this pull request? This PR proposes to explicitly cache and hash the files/directories under 'build' for SBT and Zinc at GitHub Actions. Otherwise, it can end up with overwriting `build` directory. See also #29286 (comment) Previously, other files like `build/mvn` and `build/sbt` are also cached and overwritten. So, when you have some changes there, they are ignored. ### Why are the changes needed? To make GitHub Actions build stable. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? The builds in this PR test it out. Closes #29536 from HyukjinKwon/SPARK-32695. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit b07e742) Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@gemelen, once you rebase, GitHub Actions build should not overwrite |
@gemelen thanks for the work here - does the latest change unblock you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, @gemelen .
BTW, could you make another PR for mima change? We need to do that separately like SPARK-26030 (Bump previousSparkVersion in MimaBuild.scala to be 2.4.0). According to #22977 (comment) and #22977 (comment), we need to use 3.0.0 when 3.0.0 is released.
def mimaSettings(sparkHome: File, projectRef: ProjectRef): Seq[Setting[_]] = {
val organization = "org.apache.spark"
- val previousSparkVersion = "2.4.0"
+ val previousSparkVersion = "3.0.0"
cc @gatorsmile , @cloud-fan
I'm afraid that it's not viable: UPD: previousSparkVersion may be changed on backporting to 2.4 branch, so it won't be a problem there |
Oh, thank you for the context, @gemelen . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, thanks. I think we can merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. I agree that. We can merge this.
Hi, @gemelen . What is your Apache JIRA ID? |
@dongjoon-hyun the same as github username, |
Thanks. You are added to the Apache Spark contributor group and SPARK-21708 is assigned to you. |
Hi, All. It turns out that BEFORE THIS COMMIT
AFTER THIS COMMIT
|
The failure is tracked by SPARK-33104 . |
# limitations under the License. | ||
|
||
-J-Xmx4G | ||
-J-Xss4m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemelen sorry for the very late comment but do you remember why we added this? The default memory has set in build/sbt-launch-lib.bash
(e.g., see 35bab33). Were you using plain sbt
instead of build/sbt
in your local?
This file disables the memory option from the build/sbt
script:
./build/sbt -mem 6144
.../jdk-11.0.3.jdk/Contents/Home as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Error occurred during initialization of VM
Initial heap size set to a larger value than the maximum heap size
because it adds these memory options at the last:
/.../bin/java -Xms6144m -Xmx6144m -XX:ReservedCodeCacheSize=256m -Xmx4G -Xss4m -jar build/sbt-launch-1.5.0.jar 1
and Java respects the rightmost memory configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, okay it sets the stack size here. I misread it. Okay but I will still move this to the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon AFAIR stack size increase was introduced to overcome failures on some tasks (tests probably). Yep, definitely could be set anywhere it is more suitable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message says it was for Github Actions env in fact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanation!
…t memory ### What changes were proposed in this pull request? This PR removes `.sbtopts` (added in #29286) that duplicately sets the default memory. The default memories are set: https://github.com/apache/spark/blob/3b634f66c3e4a942178a1e322ae65ce82779625d/build/sbt-launch-lib.bash#L119-L124 ### Why are the changes needed? This file disables the memory option from the `build/sbt` script: ```bash ./build/sbt -mem 6144 ``` ``` .../jdk-11.0.3.jdk/Contents/Home as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. Error occurred during initialization of VM Initial heap size set to a larger value than the maximum heap size ``` because it adds these memory options at the last: ```bash /.../bin/java -Xms6144m -Xmx6144m -XX:ReservedCodeCacheSize=256m -Xmx4G -Xss4m -jar build/sbt-launch-1.5.0.jar ``` and Java respects the rightmost memory configurations. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually ran SBT. It will be tested in the CIs in this Pr. Closes #32062 from HyukjinKwon/SPARK-34965. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…t memory ### What changes were proposed in this pull request? This PR removes `.sbtopts` (added in #29286) that duplicately sets the default memory. The default memories are set: https://github.com/apache/spark/blob/3b634f66c3e4a942178a1e322ae65ce82779625d/build/sbt-launch-lib.bash#L119-L124 ### Why are the changes needed? This file disables the memory option from the `build/sbt` script: ```bash ./build/sbt -mem 6144 ``` ``` .../jdk-11.0.3.jdk/Contents/Home as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. Error occurred during initialization of VM Initial heap size set to a larger value than the maximum heap size ``` because it adds these memory options at the last: ```bash /.../bin/java -Xms6144m -Xmx6144m -XX:ReservedCodeCacheSize=256m -Xmx4G -Xss4m -jar build/sbt-launch-1.5.0.jar ``` and Java respects the rightmost memory configurations. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually ran SBT. It will be tested in the CIs in this Pr. Closes #32062 from HyukjinKwon/SPARK-34965. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 2eda1c6) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Migrate sbt-launcher URL to download one for sbt 1.x. Update plugins versions where required by sbt update. Change sbt version to be used to latest released at the moment, 1.3.13 Adjust build settings according to plugins and sbt changes. Migration to sbt 1.x: 1. enhances dev experience in development 2. updates build plugins to bring there new features/to fix bugs in them 3. enhances build performance on sbt side 4. eases movement to Scala 3 / dotty No. All existing tests passed, both on Jenkins and via Github Actions, also manually for Scala 2.13 profile. Closes apache#29286 from gemelen/feature/sbt-1.x. Authored-by: Denis Pyshev <git@gemelen.net> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Migrate sbt-launcher URL to download one for sbt 1.x. Update plugins versions where required by sbt update. Change sbt version to be used to latest released at the moment, 1.3.13 Adjust build settings according to plugins and sbt changes. Migration to sbt 1.x: 1. enhances dev experience in development 2. updates build plugins to bring there new features/to fix bugs in them 3. enhances build performance on sbt side 4. eases movement to Scala 3 / dotty No. All existing tests passed, both on Jenkins and via Github Actions, also manually for Scala 2.13 profile. Closes apache#29286 from gemelen/feature/sbt-1.x. Authored-by: Denis Pyshev <git@gemelen.net> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Migrate sbt-launcher URL to download one for sbt 1.x. Update plugins versions where required by sbt update. Change sbt version to be used to latest released at the moment, 1.3.13 Adjust build settings according to plugins and sbt changes. Migration to sbt 1.x: 1. enhances dev experience in development 2. updates build plugins to bring there new features/to fix bugs in them 3. enhances build performance on sbt side 4. eases movement to Scala 3 / dotty No. All existing tests passed, both on Jenkins and via Github Actions, also manually for Scala 2.13 profile. Closes apache#29286 from gemelen/feature/sbt-1.x. Authored-by: Denis Pyshev <git@gemelen.net> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Migrate sbt-launcher URL to download one for sbt 1.x. Update plugins versions where required by sbt update. Change sbt version to be used to latest released at the moment, 1.3.13 Adjust build settings according to plugins and sbt changes. Migration to sbt 1.x: 1. enhances dev experience in development 2. updates build plugins to bring there new features/to fix bugs in them 3. enhances build performance on sbt side 4. eases movement to Scala 3 / dotty No. All existing tests passed, both on Jenkins and via Github Actions, also manually for Scala 2.13 profile. Closes apache#29286 from gemelen/feature/sbt-1.x. Authored-by: Denis Pyshev <git@gemelen.net> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request? This PR removes sbt-avro plugin dependency. In the current master, Build with SBT depends on the plugin but it seems never used. Originally, the plugin was introduced for `flume-sink` in SPARK-1729 (#807) but `flume-sink` is no longer in Spark repository. After SBT was upgraded to 1.x in SPARK-21708 (#29286), `avroGenerate` part was introduced in `object SQL` in `SparkBuild.scala`. It's confusable but I understand `Test / avroGenerate := (Compile / avroGenerate).value` is for suppressing sbt-avro for `sql` sub-module. In fact, Test/compile will fail if `Test / avroGenerate :=(Compile / avroGenerate).value` is commented out. `sql` sub-module contains `parquet-compat.avpr` and `parquet-compat.avdl` but according to `sql/core/src/test/README.md`, they are intended to be handled by `gen-avro.sh`. Also, in terms of Maven build, there seems to be no definition to handle `*.avpr` or `*.avdl`. Based on the above, I think we can remove `sbt-avro`. ### Why are the changes needed? If `sbt-avro` is really no longer used, it's confusable that `sbt-avro` related configurations are in `SparkBuild.scala`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? GA. Closes #33190 from sarutak/remove-avro-from-sbt. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? This PR removes sbt-avro plugin dependency. In the current master, Build with SBT depends on the plugin but it seems never used. Originally, the plugin was introduced for `flume-sink` in SPARK-1729 (#807) but `flume-sink` is no longer in Spark repository. After SBT was upgraded to 1.x in SPARK-21708 (#29286), `avroGenerate` part was introduced in `object SQL` in `SparkBuild.scala`. It's confusable but I understand `Test / avroGenerate := (Compile / avroGenerate).value` is for suppressing sbt-avro for `sql` sub-module. In fact, Test/compile will fail if `Test / avroGenerate :=(Compile / avroGenerate).value` is commented out. `sql` sub-module contains `parquet-compat.avpr` and `parquet-compat.avdl` but according to `sql/core/src/test/README.md`, they are intended to be handled by `gen-avro.sh`. Also, in terms of Maven build, there seems to be no definition to handle `*.avpr` or `*.avdl`. Based on the above, I think we can remove `sbt-avro`. ### Why are the changes needed? If `sbt-avro` is really no longer used, it's confusable that `sbt-avro` related configurations are in `SparkBuild.scala`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? GA. Closes #33190 from sarutak/remove-avro-from-sbt. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 6c4616b) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…t memory ### What changes were proposed in this pull request? This PR removes `.sbtopts` (added in apache#29286) that duplicately sets the default memory. The default memories are set: https://github.com/apache/spark/blob/3b634f66c3e4a942178a1e322ae65ce82779625d/build/sbt-launch-lib.bash#L119-L124 ### Why are the changes needed? This file disables the memory option from the `build/sbt` script: ```bash ./build/sbt -mem 6144 ``` ``` .../jdk-11.0.3.jdk/Contents/Home as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. Error occurred during initialization of VM Initial heap size set to a larger value than the maximum heap size ``` because it adds these memory options at the last: ```bash /.../bin/java -Xms6144m -Xmx6144m -XX:ReservedCodeCacheSize=256m -Xmx4G -Xss4m -jar build/sbt-launch-1.5.0.jar ``` and Java respects the rightmost memory configurations. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually ran SBT. It will be tested in the CIs in this Pr. Closes apache#32062 from HyukjinKwon/SPARK-34965. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 2eda1c6) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
Migrate sbt-launcher URL to download one for sbt 1.x.
Update plugins versions where required by sbt update.
Change sbt version to be used to latest released at the moment, 1.3.13
Adjust build settings according to plugins and sbt changes.
Why are the changes needed?
Migration to sbt 1.x:
Does this PR introduce any user-facing change?
No.
How was this patch tested?
All existing tests passed, both on Jenkins and via Github Actions, also manually for Scala 2.13 profile.