-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50295][INFRA] Add a script to build docs with image #48860
Conversation
|
||
# 3.build docs on container: `error docs`, `scala doc`, `python doc`, `sql doc` | ||
docker run \ | ||
--mount type=bind,source="${SPARK_HOME}",target="${DOCKER_MOUNT_SPARK_HOME}" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the container is going to write files to the mounted path, please make sure permission won't bother the user accessing/deleting from host. for example, if the container writes files with uid 0, the host user may have no permission to delete them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(base) ➜ spark-community git:(SPARK-50295) ✗ ls -al docs/api
total 0
drwxr-xr-x 7 panbingkun staff 224 Nov 18 19:52 .
drwx------@ 286 panbingkun staff 9152 Nov 18 19:25 ..
drwxr-xr-x@ 22 panbingkun staff 704 Nov 18 19:52 R
drwxr-xr-x 26 panbingkun staff 832 Nov 18 19:25 java
drwxr-xr-x 15 panbingkun staff 480 Nov 18 19:47 python
drwxr-xr-x 6 panbingkun staff 192 Nov 18 19:25 scala
drwxr-xr-x 11 panbingkun staff 352 Nov 18 19:49 sql
The verification process is as follows:
sh dev/spark-test-image/docs/build-docs-on-local
[info] Note: Some input files use or override a deprecated API.
[info] Note: Recompile with -Xlint:deprecation for details.
[warn] multiple main classes detected: run 'show discoveredMainClasses' to see the list
[success] Total time: 46 s, completed Nov 18, 2024, 7:23:38 PM [+] Building 8.4s (1/2) docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 8.6s (1/2) docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 8.7s (1/2) docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 8.9s (1/2) docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 9.0s (1/2) docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 9.2s (1/2) docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 9.3s (1/2) docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 9.5s (1/2) docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 9.6s (1/2) docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 9.8s (1/2) docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 93.0s (13/13) FINISHED docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.81kB 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:jammy-20240911.1 88.5s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> importing cache manifest from ghcr.io/apache/spark/apache-spark-github-action-image-docs-cache:master 4.4s
=> => inferred cache manifest type: application/vnd.oci.image.index.v1+json 0.0s
=> [1/7] FROM docker.io/library/ubuntu:jammy-20240911.1@sha256:0e5e4a57c2499249aafc3b40fcd541e9a456aab7296681a3994d631587203f97 0.0s
=> => resolve docker.io/library/ubuntu:jammy-20240911.1@sha256:0e5e4a57c2499249aafc3b40fcd541e9a456aab7296681a3994d631587203f97 0.0s
=> [auth] apache/spark/apache-spark-github-action-image-docs-cache:pull token for ghcr.io 0.0s
=> CACHED [2/7] RUN apt-get update && apt-get install -y build-essential ca-certificates curl gfortran git gnupg libcurl4-openssl-dev libfontconfig1-dev li 0.0s
=> CACHED [3/7] RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', 'rmarkdown', 'testthat'), repos='https://cloud.r-project.org/')" && Rscript -e "devtools::install_versi 0.0s
=> CACHED [4/7] RUN add-apt-repository ppa:deadsnakes/ppa 0.0s
=> CACHED [5/7] RUN apt-get update && apt-get install -y python3.9 python3.9-distutils && rm -rf /var/lib/apt/lists/* 0.0s
=> CACHED [6/7] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9 0.0s
=> CACHED [7/7] RUN python3.9 -m pip install 'sphinx==4.5.0' mkdocs 'pydata_sphinx_theme>=0.13' sphinx-copybutton nbsphinx numpydoc jinja2 markupsafe 'pyzmq<24.0.0' ipython ipython_genutils 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => exporting manifest sha256:86549617bcf8050c8b39402be5679e3663adf07de19894b872f11598c173c935 0.0s
=> => exporting config sha256:f8e2afeca787583d05cb7572d71d7ccff2fb8c7f8ec7a4b2e6f61d6fb3061d8d 0.0s
=> => exporting attestation manifest sha256:4998eaa40eace447f735eccf12114f8b63d9b975e33aa98133ee9944f7b3751d 0.0s
=> => exporting manifest list sha256:c319e5c6347755831e6cf998bff702f737769a3cad4839965c9c0322f50f7ea7 0.0s
=> => naming to docker.io/apache/spark/apache-spark-ci-image-docs:1731928756 0.0s
=> => unpacking to docker.io/apache/spark/apache-spark-ci-image-docs:1731928756 0.0s
5 warnings found (use docker --debug to expand):
- LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 30)
- UndefinedVar: Usage of undefined variable '$R_LIBS_SITE' (line 75)
- LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 75)
- LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 27)
- LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 29) Fetching bundler-2.4.22.gem
Successfully installed bundler-2.4.22
Parsing documentation for bundler-2.4.22
Installing ri documentation for bundler-2.4.22
Done installing documentation for bundler after 0 seconds
1 gem installed
Don't run Bundler as root. Installing your bundle as root will break this application for all non-root users on this machine.
Bundle complete! 4 Gemfile dependencies, 32 gems now installed.
Bundled gems are installed into `./.local_ruby_bundle`
Configuration file: /__w/spark/spark/docs/_config.yml ************************
* Building error docs. *
************************
Generated: docs/_generated/error-conditions.html *************************************
* Building Scala and Java API docs. *
*************************************
Moving back into docs dir.
Removing old docs
Making directory api/scala
cp -r ../target/scala-2.13/unidoc/. api/scala
Making directory api/java
cp -r ../target/javaunidoc/. api/java
Updating JavaDoc files for badge post-processing
Copying jquery.min.js from Scala API to Java API for page post-processing of badges
Copying api_javadocs.js to Java API for page post-processing of badges
Appending content of api-javadocs.css to JavaDoc stylesheet.css for badge styles *****************************
* Building Python API docs. *
*****************************
Running Sphinx v4.5.0
/__w/spark/spark/python/pyspark/pandas/__init__.py:43: UserWarning: 'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched.
warnings.warn(
loading pickled environment... done
[autosummary] generating autosummary for: development/contributing.rst, development/debugging.rst, development/errors.rst, development/index.rst, development/logger.rst, development/setting_ide.rst, development/testing.rst, getting_started/index.rst, getting_started/install.rst, getting_started/quickstart_connect.ipynb, ..., user_guide/pandas_on_spark/transform_apply.rst, user_guide/pandas_on_spark/typehints.rst, user_guide/pandas_on_spark/types.rst, user_guide/python_packaging.rst, user_guide/sql/arrow_pandas.rst, user_guide/sql/dataframe_column_selections.rst, user_guide/sql/index.rst, user_guide/sql/python_data_source.rst, user_guide/sql/python_udtf.rst, user_guide/sql/type_conversions.rst
[autosummary] generating autosummary for: /__w/spark/spark/python/docs/source/reference/api/pyspark.Accumulator.add.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.Accumulator.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.Accumulator.value.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.AccumulatorParam.addInPlace.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.AccumulatorParam.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.AccumulatorParam.zero.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.BarrierTaskContext.allGather.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.BarrierTaskContext.attemptNumber.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.BarrierTaskContext.barrier.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.BarrierTaskContext.cpus.rst, ..., /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQuery.status.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQuery.stop.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryListener.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.active.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.addListener.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.awaitAnyTermination.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.get.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.removeListener.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.resetTerminated.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.rst
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2298 source files that are out of date
updating environment: 0 added, 2298 changed, 0 removed
reading sources... [100%] reference/pyspark.sql/spark_session .. user_guide/pandas_on_spark/supported_pandas_api
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] reference/pyspark.ss/api/pyspark.sql.streaming.DataStreamReader .. user_guide/pandas_on_spark/supported_pandas_api
waiting for workers...
generating indices... done
highlighting module code... [100%] pyspark.util
writing additional pages... search done
copying images... [100%] ../../../docs/img/pyspark-spark_core_and_rdds.png
copying static files... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded.
The HTML pages are in build/html.
Moving back into docs dir.
Making directory api/python
cp -r ../python/docs/build/html/. api/python **************************
* Building SQL API docs. *
**************************
Generating SQL API Markdown files.
WARNING: Using incubator modules: jdk.incubator.vector
SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()');
SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b');
SELECT xpath_boolean('<a><b>1</b></a>','a/b');
SELECT xpath_double('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
SELECT xpath_float('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
SELECT xpath_int('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
SELECT xpath_long('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
SELECT xpath_number('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
SELECT xpath_short('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c');
Generating HTML files for SQL API documentation.
INFO - Cleaning site directory
INFO - Building documentation to directory: /__w/spark/spark/sql/site
INFO - Documentation built in 0.79 seconds
/__w/spark/spark/sql
Moving back into docs dir.
Making directory api/sql
cp -r ../sql/site/. api/sql
Source: /__w/spark/spark/docs
Destination: /__w/spark/spark/docs/_site
Incremental build: disabled. Enable with --incremental
Generating...
done in 23.136 seconds.
Auto-regeneration: disabled. Use --watch to enable.
Configuration file: /Users/panbingkun/Developer/spark/spark-community/docs/_config.yml ************************
* Building R API docs. *
************************
Using Scala 2.13
Using R_SCRIPT_PATH = /usr/local/bin
── Installing package SparkR into temporary library ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── Building pkgdown site for package SparkR ────────────────────────────────────
Reading from: /Users/panbingkun/Developer/spark/spark-community/R/pkg
Writing to: /Users/panbingkun/Developer/spark/spark-community/R/pkg/docs
── Sitrep ──────────────────────────────────────────────────────────────────────
✖ URLs not ok.
In DESCRIPTION, URL is missing package url
(https://spark.apache.org/docs/4.0.0/api/R).
See details in `vignette(pkgdown::metadata)`.
✔ Favicons ok.
✔ Open graph metadata ok.
✔ Articles metadata ok.
✔ Reference metadata ok.
── Initialising site ───────────────────────────────────────────────────────────
── Building home ───────────────────────────────────────────────────────────────
Reading README.md
Writing 404.html
── Building function reference ─────────────────────────────────────────────────
Warning: SparkR is deprecated in Apache Spark 4.0.0 and will be removed in a future release. To continue using Spark in R, we recommend using sparklyr instead: https://spark.posit.co/get-started/
Attaching package: ‘SparkR’
The following objects are masked from ‘package:stats’:
cov, filter, lag, na.omit, predict, sd, var, window
Reading man/write.stream.Rd
Reading man/write.text.Rd
── Building articles ───────────────────────────────────────────────────────────
Reading vignettes/sparkr-vignettes.Rmd
Writing articles/sparkr-vignettes.html
── Building sitemap ────────────────────────────────────────────────────────────
── Building redirects ──────────────────────────────────────────────────────────
── Building search index ───────────────────────────────────────────────────────
── Checking for problems ───────────────────────────────────────────────────────
── Finished building pkgdown site for package SparkR ───────────────────────────
── Finished building pkgdown site for package SparkR ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Warning messages:
1: Failed to parse usage: `` array_aggregate(x, initialValue, merge, ...) array_contains(x, value) array_distinct(x) array_except(x, y) array_exists(x, f) array_forall(x, f) array_filter(x, f)
array_intersect(x, y) array_join(x, delimiter, ...) array_max(x) array_min(x) array_position(x, value) array_remove(x, value) array_repeat(x, count) array_sort(x, ...) array_transform(x, f)
arrays_overlap(x, y) array_union(x, y) arrays_zip(x, ...) arrays_zip_with(x, y, f) concat(x, ...) element_at(x, extraction) explode(x) explode_outer(x) flatten(x) from_json(x, schema, ...)
from_csv(x, schema, ...) map_concat(x, ...) map_entries(x) map_filter(x, f) map_from_arrays(x, y) map_from_entries(x) map_keys(x) map_values(x) map_zip_with(x, y, f) posexplode(x)
posexplode_outer(x) reverse(x) schema_of_csv(x, ...) schema_of_json(x, ...) shuffle(x) size(x) slice(x, start, length) sort_array(x, asc = TRUE) transform_keys(x, f) transform_values(x, f)
to_json(x, ...) to_csv(x, ...) S4method(`reverse`, list(`Column`))(x) S4method(`to_json`, list(`Column`))(x, ...) S4method(`to_csv`, list(`Column`))(x, ...) S4method(`concat`, list(`Column`))(x,
...) S4method(`from_json`, list(`Column`,`characterOrstructTypeOrColumn`))(x, schema, as.json.array = FALSE, ...) S4method(`schema_of_json`, list(`characterOrColumn`))(x, ...) S4method(`from_csv`,
list(`Column`,`characterOrstructTypeOrColumn`))(x, schema, ...) S4method(`schema_of_csv`, list(`characterOrColumn`))(x, ...) S4method(`array_aggregate`,
list(`characterOrColumn`,`Column`,``function``))(x, initialValue, merge, finish = NULL) S4method(`array_contains`, list(`Column`))(x, value) S4method(`array_distinct`, list(`Column`))(x)
S4method(`array_except`, list(`Column`,`Column`))(x, y) S4method(`array_exists`, list(`characterOrColumn`,``function``))(x, f) S4method(`array_filter`, list(`characterOrColumn`,``function``))(x, f)
S4method(`array_forall`, list(`characterOrColumn`,``function``))(x, f) S4method(`array_intersect`, list(`Column`,`Column`))(x, y) S4method(`array_join`, list(`Column`,`character`))(x, delimiter,
nullReplacement = NULL) S4method(`array_max`, list(`Column`))(x) S4method(`array_min`, list(`Column`))(x) S4method(`array_position`, list(`Column`))(x, value) S4method(`array_remove`,
list(`Column`))(x, value) S4method(`array_repeat`, list(`Column`,`numericOrColumn`))(x, count) S4method(`array_sort`, list(`Column`))(x, comparator = NULL) S4method(`array_transform`,
list(`characterOrColumn`,``function``))(x, f) S4method(`arrays_overlap`, list(`Column`,`Column`))(x, y) S4method(`array_union`, list(`Column`,`Column`))(x, y) S4method(`arrays_zip`,
list(`Column`))(x, ...) S4method(`arrays_zip_with`, list(`characterOrColumn`,`characterOrColumn`,``function``))(x, y, f) S4method(`shuffle`, list(`Column`))(x) S4method(`flatten`, list(`Column`))(x)
S4method(`map_concat`, list(`Column`))(x, ...) S4method(`map_entries`, list(`Column`))(x) S4method(`map_filter`, list(`characterOrColumn`,``function``))(x, f) S4method(`map_from_arrays`,
list(`Column`,`Column`))(x, y) S4method(`map_from_entries`, list(`Column`))(x) S4method(`map_keys`, list(`Column`))(x) S4method(`transform_keys`, list(`characterOrColumn`,``function``))(x, f)
S4method(`transform_values`, list(`characterOrColumn`,``function``))(x, f) S4method(`map_values`, list(`Column`))(x) S4method(`map_zip_with`,
list(`characterOrColumn`,`characterOrColumn`,``function``))(x, y, f) S4method(`element_at`, list(`Column`))(x, extraction) S4method(`explode`, list(`Column`))(x) S4method(`size`, list(`Column`))(x)
S4method(`slice`, list(`Column`))(x, start, length) S4method(`sort_array`, list(`Column`))(x, asc = TRUE) S4method(`posexplode`, list(`Column`))(x) S4method(`explode_outer`, list(`Column`))(x)
S4method(`posexplode_outer`, list(`Column`))(x) ``
2: Failed to parse usage: `` dapply(x, func, schema) S4method(`dapply`, list(`SparkDataFrame`,``function``,`characterOrstructType`))(x, func, schema) ``
3: Failed to parse usage: `` dapplyCollect(x, func) S4method(`dapplyCollect`, list(`SparkDataFrame`,``function``))(x, func) ``
+ rm ../_pkgdown.yml
+ popd
~/Developer/spark/spark-community/R ~/Developer/spark/spark-community/R ~/Developer/spark/spark-community/R
+ popd
~/Developer/spark/spark-community/R ~/Developer/spark/spark-community/R
Moving back into docs dir.
Making directory api/R
cp -r ../R/pkg/docs/. api/R
Source: /Users/panbingkun/Developer/spark/spark-community/docs
Destination: /Users/panbingkun/Developer/spark/spark-community/docs/_site
Incremental build: disabled. Enable with --incremental
Generating...
done in 14.093 seconds.
Auto-regeneration: disabled. Use --watch to enable.
Untagged: apache/spark/apache-spark-ci-image-docs:1731928756
Deleted: sha256:c319e5c6347755831e6cf998bff702f737769a3cad4839965c9c0322f50f7ea7
Build doc done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dumb questions: can we move the scripts to another dictionary?
Allow me to try it. |
I have moved the script from |
will verify the script later |
Thank you very much! ❤️ |
The script can executed successfully, thank you very much, @panbingkun . However, should the final generated results only exist in the |
@LuciferYang Thank you very much for helping to verify! ❤️ I think the above issue is due to a problem with the script spark/docs/_plugins/build_api_docs.rb Lines 189 to 193 in e03319f
Can we solve this issue with a new separate PR? |
Yeah, it's fine for me to add some cleanup logic in a separate followup. It would be more friendly for local build |
LGTM, but it would be best to wait for @zhengruifeng to take another look. |
Add a note (I communicated privately with @LuciferYang and confirmed this).
(base) ➜ .docker pwd
/Users/panbingkun/.docker
(base) ➜ .docker cat daemon.json
{
"builder": {
"gc": {
"defaultKeepStorage": "20GB",
"enabled": true
}
},
"experimental": false,
"registry-mirrors": [
"https://registry.docker-cn.com",
"http://hub-mirror.c.163.com",
"https://docker.mirrors.ustc.edu.cn",
"https://dockerhub.azk8s.cn",
"https://mirror.ccs.tencentyun.com",
"https://registry.cn-hangzhou.aliyuncs.com",
"https://docker.mirrors.ustc.edu.cn",
"https://docker.1panel.live",
"https://atomhub.openatom.cn/",
"https://hub.uuuadc.top",
"https://docker.anyhub.us.kg",
"https://dockerhub.jobcher.com",
"https://dockerhub.icu",
"https://docker.ckyl.me",
"https://docker.awsl9527.cn"
]
} |
--interactive --tty "${IMG_URL}" \ | ||
/bin/bash -c "sh ${BUILD_DOCS_SCRIPT_PATH}" | ||
|
||
# 4.Build docs on host: `r doc`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that SparkR is deprecated and has fewer changes, how about respecting SKIP_RDOC here? So that developers don't need to have an R env installed on the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I have also thought about this question.
How do you all think about it?
This will makes the script look much more concise and pretty!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second thought, we can also passthrough SKIP_ERRORDOC
, SKIP_SCALADOC
, SKIP_PYTHONDOC
, SKIP_SQLDOC
into the container, so that all existing flags still work as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a difference, we do not want to run sbt compile
in the containe
r because there is no ivy cache
in the container. when executing it, all dependent jars
will be downloaded. If we use a similar mounting workaround, the complexity will increase significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we run sbt compile
in the container, it will be very slow.
the script works well on my local machine, thanks @panbingkun |
@zhengruifeng I suppose the script is intended to be used by developers, if so, maybe just put it at |
@@ -0,0 +1,71 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the script seems does not have x
permission, you can grant it by chmod a+x <path>
and the git will record permissions correctly in UNIX-like OS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Thank you very much for helping to verify again! ❤️ |
Merged to master. |
@@ -0,0 +1,71 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document this in docs/README.md?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me do it as a follow up pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review: #49013
thanks!
What changes were proposed in this pull request?
The pr aims to add a script to build docs with image.
The overall idea is as follows:
Mount local files to the container (this way, there is no need to copy the spark file to the container, and
the compiled spark package
is already prepared in thelocal spark folder
, so there isno need
to compile it again in the container, otherwise it will re-download many dependency jars, which is very time-cost)error docs
,scala doc
,python doc
andsql doc
in container.r docs
in host.Why does
r
document need to be compiled outside the container ?Because when compiling inside the container, the permission of the directory
/__w/spark/spark/R/pkg/docs
automatically generated byRScript
isdr-xr--r-x
, and when writing to subsequent files, will throw an error as:! [EACCES] Failed to copy '/usr/local/lib/R/site-library/pkgdown/BS5/assets/katex-auto.js' to '/__w/spark/spark/R/pkg/docs/katex-auto.js': permission denied
Why are the changes needed?
For developers of pyspark, some python libraries are conflicts between the environment for generating docs and the development environment. In order to help developers verify more easily.
Does this PR introduce any user-facing change?
No, only for spark developers.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No.