Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50295][INFRA] Add a script to build docs with image #48860

Closed
wants to merge 9 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Nov 15, 2024

What changes were proposed in this pull request?

The pr aims to add a script to build docs with image.

The overall idea is as follows:

  • prepare compiled Spark packages for various subsequent documents (on host)
  • build image from cache
  • run image as container
    Mount local files to the container (this way, there is no need to copy the spark file to the container, and the compiled spark package is already prepared in the local spark folder, so there is no need to compile it again in the container, otherwise it will re-download many dependency jars, which is very time-cost)
  • generate error docs, scala doc, python doc and sql doc in container.
  • generate r docs in host.
    Why does r document need to be compiled outside the container ?
    Because when compiling inside the container, the permission of the directory /__w/spark/spark/R/pkg/docs automatically generated by RScript is dr-xr--r-x, and when writing to subsequent files, will throw an error as:
    ! [EACCES] Failed to copy '/usr/local/lib/R/site-library/pkgdown/BS5/assets/katex-auto.js' to '/__w/spark/spark/R/pkg/docs/katex-auto.js': permission denied

Why are the changes needed?

For developers of pyspark, some python libraries are conflicts between the environment for generating docs and the development environment. In order to help developers verify more easily.

Does this PR introduce any user-facing change?

No, only for spark developers.

How was this patch tested?

  • Pass GA.
  • Manually test (The verification process can be found in the comments).

Was this patch authored or co-authored using generative AI tooling?

No.


# 3.build docs on container: `error docs`, `scala doc`, `python doc`, `sql doc`
docker run \
--mount type=bind,source="${SPARK_HOME}",target="${DOCKER_MOUNT_SPARK_HOME}" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the container is going to write files to the mounted path, please make sure permission won't bother the user accessing/deleting from host. for example, if the container writes files with uid 0, the host user may have no permission to delete them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(base) ➜  spark-community git:(SPARK-50295) ✗ ls -al docs/api
total 0
drwxr-xr-x    7 panbingkun  staff   224 Nov 18 19:52 .
drwx------@ 286 panbingkun  staff  9152 Nov 18 19:25 ..
drwxr-xr-x@  22 panbingkun  staff   704 Nov 18 19:52 R
drwxr-xr-x   26 panbingkun  staff   832 Nov 18 19:25 java
drwxr-xr-x   15 panbingkun  staff   480 Nov 18 19:47 python
drwxr-xr-x    6 panbingkun  staff   192 Nov 18 19:25 scala
drwxr-xr-x   11 panbingkun  staff   352 Nov 18 19:49 sql

@panbingkun
Copy link
Contributor Author

panbingkun commented Nov 18, 2024

The verification process is as follows:

  • Run the following command:
sh dev/spark-test-image/docs/build-docs-on-local
  • The process of run is as follows:
[info] Note: Some input files use or override a deprecated API.
[info] Note: Recompile with -Xlint:deprecation for details.
[warn] multiple main classes detected: run 'show discoveredMainClasses' to see the list
[success] Total time: 46 s, completed Nov 18, 2024, 7:23:38 PM
[+] Building 8.4s (1/2)                                                                                                                                                           docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                              0.0s
[+] Building 8.6s (1/2)                                                                                                                                                           docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                              0.0s
[+] Building 8.7s (1/2)                                                                                                                                                           docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                              0.0s
[+] Building 8.9s (1/2)                                                                                                                                                           docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                              0.0s
[+] Building 9.0s (1/2)                                                                                                                                                           docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                              0.0s
[+] Building 9.2s (1/2)                                                                                                                                                           docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                              0.0s
[+] Building 9.3s (1/2)                                                                                                                                                           docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                              0.0s
[+] Building 9.5s (1/2)                                                                                                                                                           docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                              0.0s
[+] Building 9.6s (1/2)                                                                                                                                                           docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                              0.0s
[+] Building 9.8s (1/2)                                                                                                                                                           docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                              0.0s
[+] Building 93.0s (13/13) FINISHED                                                                                                                                               docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                              0.0s
 => => transferring dockerfile: 3.81kB                                                                                                                                                            0.0s
 => [internal] load metadata for docker.io/library/ubuntu:jammy-20240911.1                                                                                                                       88.5s
 => [internal] load .dockerignore                                                                                                                                                                 0.0s
 => => transferring context: 2B                                                                                                                                                                   0.0s
 => importing cache manifest from ghcr.io/apache/spark/apache-spark-github-action-image-docs-cache:master                                                                                         4.4s
 => => inferred cache manifest type: application/vnd.oci.image.index.v1+json                                                                                                                      0.0s
 => [1/7] FROM docker.io/library/ubuntu:jammy-20240911.1@sha256:0e5e4a57c2499249aafc3b40fcd541e9a456aab7296681a3994d631587203f97                                                                  0.0s
 => => resolve docker.io/library/ubuntu:jammy-20240911.1@sha256:0e5e4a57c2499249aafc3b40fcd541e9a456aab7296681a3994d631587203f97                                                                  0.0s
 => [auth] apache/spark/apache-spark-github-action-image-docs-cache:pull token for ghcr.io                                                                                                        0.0s
 => CACHED [2/7] RUN apt-get update && apt-get install -y     build-essential     ca-certificates     curl     gfortran     git     gnupg     libcurl4-openssl-dev     libfontconfig1-dev     li  0.0s
 => CACHED [3/7] RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', 'rmarkdown', 'testthat'), repos='https://cloud.r-project.org/')" &&     Rscript -e "devtools::install_versi  0.0s
 => CACHED [4/7] RUN add-apt-repository ppa:deadsnakes/ppa                                                                                                                                        0.0s
 => CACHED [5/7] RUN apt-get update && apt-get install -y python3.9 python3.9-distutils     && rm -rf /var/lib/apt/lists/*                                                                        0.0s
 => CACHED [6/7] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9                                                                                                                    0.0s
 => CACHED [7/7] RUN python3.9 -m pip install 'sphinx==4.5.0' mkdocs 'pydata_sphinx_theme>=0.13' sphinx-copybutton nbsphinx numpydoc jinja2 markupsafe 'pyzmq<24.0.0'   ipython ipython_genutils  0.0s
 => exporting to image                                                                                                                                                                            0.0s
 => => exporting layers                                                                                                                                                                           0.0s
 => => exporting manifest sha256:86549617bcf8050c8b39402be5679e3663adf07de19894b872f11598c173c935                                                                                                 0.0s
 => => exporting config sha256:f8e2afeca787583d05cb7572d71d7ccff2fb8c7f8ec7a4b2e6f61d6fb3061d8d                                                                                                   0.0s
 => => exporting attestation manifest sha256:4998eaa40eace447f735eccf12114f8b63d9b975e33aa98133ee9944f7b3751d                                                                                     0.0s
 => => exporting manifest list sha256:c319e5c6347755831e6cf998bff702f737769a3cad4839965c9c0322f50f7ea7                                                                                            0.0s
 => => naming to docker.io/apache/spark/apache-spark-ci-image-docs:1731928756                                                                                                                     0.0s
 => => unpacking to docker.io/apache/spark/apache-spark-ci-image-docs:1731928756                                                                                                                  0.0s

 5 warnings found (use docker --debug to expand):
 - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 30)
 - UndefinedVar: Usage of undefined variable '$R_LIBS_SITE' (line 75)
 - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 75)
 - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 27)
 - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 29)
Fetching bundler-2.4.22.gem
Successfully installed bundler-2.4.22
Parsing documentation for bundler-2.4.22
Installing ri documentation for bundler-2.4.22
Done installing documentation for bundler after 0 seconds
1 gem installed
Don't run Bundler as root. Installing your bundle as root will break this application for all non-root users on this machine.
Bundle complete! 4 Gemfile dependencies, 32 gems now installed.
Bundled gems are installed into `./.local_ruby_bundle`
Configuration file: /__w/spark/spark/docs/_config.yml
************************
* Building error docs. *
************************
Generated: docs/_generated/error-conditions.html
*************************************
* Building Scala and Java API docs. *
*************************************
Moving back into docs dir.
Removing old docs
Making directory api/scala
cp -r ../target/scala-2.13/unidoc/. api/scala
Making directory api/java
cp -r ../target/javaunidoc/. api/java
Updating JavaDoc files for badge post-processing
Copying jquery.min.js from Scala API to Java API for page post-processing of badges
Copying api_javadocs.js to Java API for page post-processing of badges
Appending content of api-javadocs.css to JavaDoc stylesheet.css for badge styles
*****************************
* Building Python API docs. *
*****************************
Running Sphinx v4.5.0
/__w/spark/spark/python/pyspark/pandas/__init__.py:43: UserWarning: 'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched.
  warnings.warn(
loading pickled environment... done
[autosummary] generating autosummary for: development/contributing.rst, development/debugging.rst, development/errors.rst, development/index.rst, development/logger.rst, development/setting_ide.rst, development/testing.rst, getting_started/index.rst, getting_started/install.rst, getting_started/quickstart_connect.ipynb, ..., user_guide/pandas_on_spark/transform_apply.rst, user_guide/pandas_on_spark/typehints.rst, user_guide/pandas_on_spark/types.rst, user_guide/python_packaging.rst, user_guide/sql/arrow_pandas.rst, user_guide/sql/dataframe_column_selections.rst, user_guide/sql/index.rst, user_guide/sql/python_data_source.rst, user_guide/sql/python_udtf.rst, user_guide/sql/type_conversions.rst
[autosummary] generating autosummary for: /__w/spark/spark/python/docs/source/reference/api/pyspark.Accumulator.add.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.Accumulator.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.Accumulator.value.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.AccumulatorParam.addInPlace.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.AccumulatorParam.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.AccumulatorParam.zero.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.BarrierTaskContext.allGather.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.BarrierTaskContext.attemptNumber.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.BarrierTaskContext.barrier.rst, /__w/spark/spark/python/docs/source/reference/api/pyspark.BarrierTaskContext.cpus.rst, ..., /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQuery.status.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQuery.stop.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryListener.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.active.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.addListener.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.awaitAnyTermination.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.get.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.removeListener.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.resetTerminated.rst, /__w/spark/spark/python/docs/source/reference/pyspark.ss/api/pyspark.sql.streaming.StreamingQueryManager.rst
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2298 source files that are out of date
updating environment: 0 added, 2298 changed, 0 removed
reading sources... [100%] reference/pyspark.sql/spark_session .. user_guide/pandas_on_spark/supported_pandas_api


looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] reference/pyspark.ss/api/pyspark.sql.streaming.DataStreamReader .. user_guide/pandas_on_spark/supported_pandas_api
waiting for workers...
generating indices... done
highlighting module code... [100%] pyspark.util
writing additional pages... search done
copying images... [100%] ../../../docs/img/pyspark-spark_core_and_rdds.png
copying static files... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded.

The HTML pages are in build/html.
Moving back into docs dir.
Making directory api/python
cp -r ../python/docs/build/html/. api/python
**************************
* Building SQL API docs. *
**************************
Generating SQL API Markdown files.
WARNING: Using incubator modules: jdk.incubator.vector


    SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()');
    SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b');
    SELECT xpath_boolean('<a><b>1</b></a>','a/b');
    SELECT xpath_double('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
    SELECT xpath_float('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
    SELECT xpath_int('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
    SELECT xpath_long('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
    SELECT xpath_number('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
    SELECT xpath_short('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
    SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c');
Generating HTML files for SQL API documentation.
INFO    -  Cleaning site directory
INFO    -  Building documentation to directory: /__w/spark/spark/sql/site
INFO    -  Documentation built in 0.79 seconds
/__w/spark/spark/sql
Moving back into docs dir.
Making directory api/sql
cp -r ../sql/site/. api/sql
            Source: /__w/spark/spark/docs
       Destination: /__w/spark/spark/docs/_site
 Incremental build: disabled. Enable with --incremental
      Generating...
                    done in 23.136 seconds.
 Auto-regeneration: disabled. Use --watch to enable.
Configuration file: /Users/panbingkun/Developer/spark/spark-community/docs/_config.yml
************************
* Building R API docs. *
************************
Using Scala 2.13
Using R_SCRIPT_PATH = /usr/local/bin


── Installing package SparkR into temporary library ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── Building pkgdown site for package SparkR ────────────────────────────────────
Reading from: /Users/panbingkun/Developer/spark/spark-community/R/pkg
Writing to: /Users/panbingkun/Developer/spark/spark-community/R/pkg/docs
── Sitrep ──────────────────────────────────────────────────────────────────────
✖ URLs not ok.
  In DESCRIPTION, URL is missing package url
  (https://spark.apache.org/docs/4.0.0/api/R).
  See details in `vignette(pkgdown::metadata)`.
✔ Favicons ok.
✔ Open graph metadata ok.
✔ Articles metadata ok.
✔ Reference metadata ok.
── Initialising site ───────────────────────────────────────────────────────────
── Building home ───────────────────────────────────────────────────────────────
Reading README.md
Writing 404.html
── Building function reference ─────────────────────────────────────────────────
Warning: SparkR is deprecated in Apache Spark 4.0.0 and will be removed in a future release. To continue using Spark in R, we recommend using sparklyr instead: https://spark.posit.co/get-started/

Attaching package: ‘SparkR’

The following objects are masked from ‘package:stats’:

    cov, filter, lag, na.omit, predict, sd, var, window


Reading man/write.stream.Rd
Reading man/write.text.Rd
── Building articles ───────────────────────────────────────────────────────────
Reading vignettes/sparkr-vignettes.Rmd
Writing articles/sparkr-vignettes.html
── Building sitemap ────────────────────────────────────────────────────────────
── Building redirects ──────────────────────────────────────────────────────────
── Building search index ───────────────────────────────────────────────────────
── Checking for problems ───────────────────────────────────────────────────────
── Finished building pkgdown site for package SparkR ───────────────────────────
── Finished building pkgdown site for package SparkR ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Warning messages:
1: Failed to parse usage: `` array_aggregate(x, initialValue, merge, ...) array_contains(x, value) array_distinct(x) array_except(x, y) array_exists(x, f) array_forall(x, f) array_filter(x, f)
array_intersect(x, y) array_join(x, delimiter, ...) array_max(x) array_min(x) array_position(x, value) array_remove(x, value) array_repeat(x, count) array_sort(x, ...) array_transform(x, f)
arrays_overlap(x, y) array_union(x, y) arrays_zip(x, ...) arrays_zip_with(x, y, f) concat(x, ...) element_at(x, extraction) explode(x) explode_outer(x) flatten(x) from_json(x, schema, ...)
from_csv(x, schema, ...) map_concat(x, ...) map_entries(x) map_filter(x, f) map_from_arrays(x, y) map_from_entries(x) map_keys(x) map_values(x) map_zip_with(x, y, f) posexplode(x)
posexplode_outer(x) reverse(x) schema_of_csv(x, ...) schema_of_json(x, ...) shuffle(x) size(x) slice(x, start, length) sort_array(x, asc = TRUE) transform_keys(x, f) transform_values(x, f)
to_json(x, ...) to_csv(x, ...) S4method(`reverse`, list(`Column`))(x) S4method(`to_json`, list(`Column`))(x, ...) S4method(`to_csv`, list(`Column`))(x, ...) S4method(`concat`, list(`Column`))(x,
...) S4method(`from_json`, list(`Column`,`characterOrstructTypeOrColumn`))(x, schema, as.json.array = FALSE, ...) S4method(`schema_of_json`, list(`characterOrColumn`))(x, ...) S4method(`from_csv`,
list(`Column`,`characterOrstructTypeOrColumn`))(x, schema, ...) S4method(`schema_of_csv`, list(`characterOrColumn`))(x, ...) S4method(`array_aggregate`,
list(`characterOrColumn`,`Column`,``function``))(x, initialValue, merge, finish = NULL) S4method(`array_contains`, list(`Column`))(x, value) S4method(`array_distinct`, list(`Column`))(x)
S4method(`array_except`, list(`Column`,`Column`))(x, y) S4method(`array_exists`, list(`characterOrColumn`,``function``))(x, f) S4method(`array_filter`, list(`characterOrColumn`,``function``))(x, f)
S4method(`array_forall`, list(`characterOrColumn`,``function``))(x, f) S4method(`array_intersect`, list(`Column`,`Column`))(x, y) S4method(`array_join`, list(`Column`,`character`))(x, delimiter,
nullReplacement = NULL) S4method(`array_max`, list(`Column`))(x) S4method(`array_min`, list(`Column`))(x) S4method(`array_position`, list(`Column`))(x, value) S4method(`array_remove`,
list(`Column`))(x, value) S4method(`array_repeat`, list(`Column`,`numericOrColumn`))(x, count) S4method(`array_sort`, list(`Column`))(x, comparator = NULL) S4method(`array_transform`,
list(`characterOrColumn`,``function``))(x, f) S4method(`arrays_overlap`, list(`Column`,`Column`))(x, y) S4method(`array_union`, list(`Column`,`Column`))(x, y) S4method(`arrays_zip`,
list(`Column`))(x, ...) S4method(`arrays_zip_with`, list(`characterOrColumn`,`characterOrColumn`,``function``))(x, y, f) S4method(`shuffle`, list(`Column`))(x) S4method(`flatten`, list(`Column`))(x)
S4method(`map_concat`, list(`Column`))(x, ...) S4method(`map_entries`, list(`Column`))(x) S4method(`map_filter`, list(`characterOrColumn`,``function``))(x, f) S4method(`map_from_arrays`,
list(`Column`,`Column`))(x, y) S4method(`map_from_entries`, list(`Column`))(x) S4method(`map_keys`, list(`Column`))(x) S4method(`transform_keys`, list(`characterOrColumn`,``function``))(x, f)
S4method(`transform_values`, list(`characterOrColumn`,``function``))(x, f) S4method(`map_values`, list(`Column`))(x) S4method(`map_zip_with`,
list(`characterOrColumn`,`characterOrColumn`,``function``))(x, y, f) S4method(`element_at`, list(`Column`))(x, extraction) S4method(`explode`, list(`Column`))(x) S4method(`size`, list(`Column`))(x)
S4method(`slice`, list(`Column`))(x, start, length) S4method(`sort_array`, list(`Column`))(x, asc = TRUE) S4method(`posexplode`, list(`Column`))(x) S4method(`explode_outer`, list(`Column`))(x)
S4method(`posexplode_outer`, list(`Column`))(x) ``
2: Failed to parse usage: `` dapply(x, func, schema) S4method(`dapply`, list(`SparkDataFrame`,``function``,`characterOrstructType`))(x, func, schema) ``
3: Failed to parse usage: `` dapplyCollect(x, func) S4method(`dapplyCollect`, list(`SparkDataFrame`,``function``))(x, func) ``
+ rm ../_pkgdown.yml
+ popd
~/Developer/spark/spark-community/R ~/Developer/spark/spark-community/R ~/Developer/spark/spark-community/R
+ popd
~/Developer/spark/spark-community/R ~/Developer/spark/spark-community/R
Moving back into docs dir.
Making directory api/R
cp -r ../R/pkg/docs/. api/R
            Source: /Users/panbingkun/Developer/spark/spark-community/docs
       Destination: /Users/panbingkun/Developer/spark/spark-community/docs/_site
 Incremental build: disabled. Enable with --incremental
      Generating...
                    done in 14.093 seconds.
 Auto-regeneration: disabled. Use --watch to enable.
Untagged: apache/spark/apache-spark-ci-image-docs:1731928756
Deleted: sha256:c319e5c6347755831e6cf998bff702f737769a3cad4839965c9c0322f50f7ea7
Build doc done.

Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumb questions: can we move the scripts to another dictionary?

@panbingkun
Copy link
Contributor Author

dumb questions: can we move the scripts to another dictionary?

Allow me to try it.

@panbingkun
Copy link
Contributor Author

dumb questions: can we move the scripts to another dictionary?

I have moved the script from dev/spark-test-image/docs to dev/spark-test-image-utils/docs, and the local testing is okay.

@panbingkun panbingkun marked this pull request as ready for review November 26, 2024 02:48
@panbingkun
Copy link
Contributor Author

cc @HyukjinKwon @LuciferYang

@LuciferYang
Copy link
Contributor

will verify the script later

@panbingkun
Copy link
Contributor Author

will verify the script later

Thank you very much! ❤️

@LuciferYang
Copy link
Contributor

The script can executed successfully, thank you very much, @panbingkun .

However, should the final generated results only exist in the docs/site/ directory? It seems that copies of the generated .html files exist in many other places, such as the sql/site/ directory and the docs/api directory. Additionally, since these files cannot currently be cleaned using commands like sbt clean or mvn clean, this results in many extra .html files being left in the project work space after each build.

@panbingkun
Copy link
Contributor Author

The script can executed successfully, thank you very much, @panbingkun .

However, should the final generated results only exist in the docs/site/ directory? It seems that copies of the generated .html files exist in many other places, such as the sql/site/ directory and the docs/api directory. Additionally, since these files cannot currently be cleaned using commands like sbt clean or mvn clean, this results in many extra .html files being left in the project work space after each build.

@LuciferYang Thank you very much for helping to verify! ❤️

I think the above issue is due to a problem with the script build_api_docs.rb itself that has existed in the past, as follows:

puts "Making directory api/sql"
mkdir_p "api/sql"
puts "cp -r ../sql/site/. api/sql"
cp_r("../sql/site/.", "api/sql")

Can we solve this issue with a new separate PR?

@LuciferYang
Copy link
Contributor

Yeah, it's fine for me to add some cleanup logic in a separate followup. It would be more friendly for local build

@LuciferYang
Copy link
Contributor

LGTM, but it would be best to wait for @zhengruifeng to take another look.

@panbingkun
Copy link
Contributor Author

Add a note (I communicated privately with @LuciferYang and confirmed this).

  • If you encounter similar issues:
ERROR: failed to solve: ubuntu:jammy-20240911.1: failed to resolve source metadata for docker.io/library/ubuntu:jammy-20240911.1: failed to authorize: failed to fetch oauth token: Post "https://auth.docker.io/token": read tcp 192.168.1.23:49300->54.236.113.205:443: read: connection reset by peer
  • please add the registry-mirrors to the file ~/.docker/daemon.json:
(base) ➜  .docker pwd
/Users/panbingkun/.docker
(base) ➜  .docker cat daemon.json
{
  "builder": {
    "gc": {
      "defaultKeepStorage": "20GB",
      "enabled": true
    }
  },
  "experimental": false,
  "registry-mirrors": [
    "https://registry.docker-cn.com",
    "http://hub-mirror.c.163.com",
    "https://docker.mirrors.ustc.edu.cn",
    "https://dockerhub.azk8s.cn",
    "https://mirror.ccs.tencentyun.com",
    "https://registry.cn-hangzhou.aliyuncs.com",
    "https://docker.mirrors.ustc.edu.cn",
    "https://docker.1panel.live",
    "https://atomhub.openatom.cn/",
    "https://hub.uuuadc.top",
    "https://docker.anyhub.us.kg",
    "https://dockerhub.jobcher.com",
    "https://dockerhub.icu",
    "https://docker.ckyl.me",
    "https://docker.awsl9527.cn"
  ]
}

--interactive --tty "${IMG_URL}" \
/bin/bash -c "sh ${BUILD_DOCS_SCRIPT_PATH}"

# 4.Build docs on host: `r doc`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that SparkR is deprecated and has fewer changes, how about respecting SKIP_RDOC here? So that developers don't need to have an R env installed on the host.

Copy link
Contributor Author

@panbingkun panbingkun Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I have also thought about this question.
How do you all think about it?
This will makes the script look much more concise and pretty!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second thought, we can also passthrough SKIP_ERRORDOC, SKIP_SCALADOC, SKIP_PYTHONDOC, SKIP_SQLDOC into the container, so that all existing flags still work as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference, we do not want to run sbt compile in the container because there is no ivy cache in the container. when executing it, all dependent jars will be downloaded. If we use a similar mounting workaround, the complexity will increase significantly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we run sbt compile in the container, it will be very slow.

@pan3793
Copy link
Member

pan3793 commented Nov 27, 2024

the script works well on my local machine, thanks @panbingkun

@pan3793
Copy link
Member

pan3793 commented Nov 27, 2024

dumb questions: can we move the scripts to another dictionary?

@zhengruifeng I suppose the script is intended to be used by developers, if so, maybe just put it at dev/build-docs?

@@ -0,0 +1,71 @@
#!/usr/bin/env bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the script seems does not have x permission, you can grant it by chmod a+x <path> and the git will record permissions correctly in UNIX-like OS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@panbingkun
Copy link
Contributor Author

the script works well on my local machine, thanks @panbingkun

Thank you very much for helping to verify again! ❤️

@panbingkun
Copy link
Contributor Author

Merged to master.
Thank you, @zhengruifeng, @LuciferYang, @pan3793 .

@@ -0,0 +1,71 @@
#!/usr/bin/env bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document this in docs/README.md?

Copy link
Contributor Author

@panbingkun panbingkun Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me do it as a follow up pr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review: #49013
thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants