Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30572][BUILD] Add a fallback Maven repository #27281

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,13 @@ jobs:
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
export MAVEN_CLI_OPTS="--no-transfer-progress"
mkdir -p ~/.m2
# `Maven Central` is too flaky in terms of downloading artifacts in `GitHub Action` environment.
# `Google Maven Central Mirror` is too slow in terms of sycing upstream. To get the best combination,
# 1) we set `Google Maven Central` as a mirror of `central` in `GitHub Action` environment only.
# 2) we duplicates `Maven Central` in pom.xml with ID `central_without_mirror`.
# In other words, in GitHub Action environment, `central` is mirrored by `Google Maven Central` first.
# If `Google Maven Central` doesn't provide the artifact due to its slowness, `central_without_mirror` will be used.
# Note that we aim to achieve the above while keeping the existing behavior of non-`GitHub Action` environment unchanged.
echo "<settings><mirrors><mirror><id>google-maven-central</id><name>GCS Maven Central mirror</name><url>https://maven-central.storage-download.googleapis.com/repos/central/data/</url><mirrorOf>central</mirrorOf></mirror></mirrors></settings>" > ~/.m2/settings.xml
./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pmesos -Pkubernetes -Phive -P${{ matrix.hive }} -Phive-thriftserver -P${{ matrix.hadoop }} -Phadoop-cloud -Djava.version=${{ matrix.java }} install
rm -rf ~/.m2/repository/org/apache/spark
Expand Down
16 changes: 16 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,22 @@
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I get it. At this point though, would it just be easier to declare the Google repo above central above, and not use the mirror settings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I'm not sure that we can declare Google Maven Central as Apache Spark default repo because not everyone welcome Google repo. In Apache Spark, we only use this in GitHub Action. The other build environment, we don't use Google Maven Central at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure, @srowen ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it could be listed second. I don't see a problem with including a mirror as a repository here, as it's just a mirror of central. (We have had in the past, even, vendor-specific repos.) I hope it would have the same effect in improving robustness as anyone running the build would have it look at both repos before failing.

<id>central_without_mirror</id>
<!--
This is used as a fallback when a mirror to `central` fail.
For example, when we use Google Maven Central in GitHub Action as a mirror of `central`,
this will be used when Google Maven Central is out of sync due to its late sync cycle.
-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I mistakenly thought the maven automatically would do the fallback in the mirror failure case.

How about leaving some comments, too, about this behaivour in the script of GitHub Actions?

echo "<settings><mirrors><mirror><id>google-maven-central</id><name>GCS Maven Central mirror</name><url>https://maven-central.storage-download.googleapis.com/repos/central/data/</url><mirrorOf>central</mirrorOf></mirror></mirrors></settings>" > ~/.m2/settings.xml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, @maropu .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

<name>Maven Repository</name>
<url>https://repo.maven.apache.org/maven2</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
Expand Down