Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49034][CORE] Support server-side sparkProperties replacement in REST Submission API #47511

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jul 27, 2024

What changes were proposed in this pull request?

Like SPARK-49033, this PR aims to support server-side sparkProperties replacement in REST Submission API.

  • For example, ephemeral Spark clusters with server-side environment variables can provide backend-resource and information without touching client-side applications and configurations.

  • The place holder pattern is {{SERVER_ENVIRONMENT_VARIABLE_NAME}} style like the following.

<code>-verbose:gc -Xloggc:/tmp/{{APP_ID}}-{{EXECUTOR_ID}}.gc</code>

"org.apache.spark.deploy.worker.DriverWrapper",
Seq("{{WORKER_URL}}", "{{USER_JAR}}", mainClass) ++ appArgs, // args to the DriverWrapper

Why are the changes needed?

A user can submits an environment variable holder like {{AWS_ENDPOINT_URL}} in order to use server-wide environment variables of Spark Master.

$ SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true" \
  AWS_ENDPOINT_URL=ENDPOINT_FOR_THIS_CLUSTER \
  sbin/start-master.sh

$ sbin/start-worker.sh spark://$(hostname):7077
curl -s -k -XPOST http://localhost:6066/v1/submissions/create \
  --header "Content-Type:application/json;charset=UTF-8" \
  --data '{
          "appResource": "",
          "sparkProperties": {
            "spark.master": "spark://localhost:7077",
            "spark.app.name": "",
            "spark.submit.deployMode": "cluster",
            "spark.hadoop.fs.s3a.endpoint": "{{AWS_ENDPOINT_URL}}",
            "spark.jars": "/Users/dongjoon/APACHE/spark-merge/examples/target/scala-2.13/jars/spark-examples_2.13-4.0.0-SNAPSHOT.jar"
          },
          "clientSparkVersion": "",
          "mainClass": "org.apache.spark.examples.SparkPi",
          "environmentVariables": {},
          "action": "CreateSubmissionRequest",
          "appArgs": [ "10000" ]
  }'

Screenshot 2024-07-26 at 22 00 26

Does this PR introduce any user-facing change?

No. This is a new feature and disabled by default via spark.master.rest.enabled (default: false)

How was this patch tested?

Pass the CIs with newly added test case.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CORE label Jul 27, 2024
@dongjoon-hyun dongjoon-hyun marked this pull request as draft July 27, 2024 04:57
@dongjoon-hyun
Copy link
Member Author

Could you review this PR too, @viirya and @yaooqinn ?

@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review July 27, 2024 05:02
@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya !

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49034][CORE] Support server-side sparkProperties replacementin REST Submission API [SPARK-49034][CORE] Support server-side sparkProperties replacement in REST Submission API Jul 27, 2024
@dongjoon-hyun
Copy link
Member Author

Thank you, @yaooqinn .

@dongjoon-hyun dongjoon-hyun deleted the SPARK-49034-2 branch July 27, 2024 07:39
@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 4.0.0-preview2.

ilicmarkodb pushed a commit to ilicmarkodb/spark that referenced this pull request Jul 29, 2024
… in REST Submission API

### What changes were proposed in this pull request?

Like SPARK-49033, this PR aims to support server-side `sparkProperties` replacement in REST Submission API.

- For example, ephemeral Spark clusters with server-side environment variables can provide backend-resource and information without touching client-side applications and configurations.

- The place holder pattern is `{{SERVER_ENVIRONMENT_VARIABLE_NAME}}` style like the following.

https://github.com/apache/spark/blob/163e512c53208301a8511310023d930d8b77db96/docs/configuration.md?plain=1#L694

https://github.com/apache/spark/blob/163e512c53208301a8511310023d930d8b77db96/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L233-L234

### Why are the changes needed?

A user can submits an environment variable holder like `{{AWS_ENDPOINT_URL}}` in order to use server-wide environment variables of Spark Master.

```
$ SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true" \
  AWS_ENDPOINT_URL=ENDPOINT_FOR_THIS_CLUSTER \
  sbin/start-master.sh

$ sbin/start-worker.sh spark://$(hostname):7077
```

```
curl -s -k -XPOST http://localhost:6066/v1/submissions/create \
  --header "Content-Type:application/json;charset=UTF-8" \
  --data '{
          "appResource": "",
          "sparkProperties": {
            "spark.master": "spark://localhost:7077",
            "spark.app.name": "",
            "spark.submit.deployMode": "cluster",
            "spark.hadoop.fs.s3a.endpoint": "{{AWS_ENDPOINT_URL}}",
            "spark.jars": "/Users/dongjoon/APACHE/spark-merge/examples/target/scala-2.13/jars/spark-examples_2.13-4.0.0-SNAPSHOT.jar"
          },
          "clientSparkVersion": "",
          "mainClass": "org.apache.spark.examples.SparkPi",
          "environmentVariables": {},
          "action": "CreateSubmissionRequest",
          "appArgs": [ "10000" ]
  }'
```

- http://localhost:4040/environment/

![Screenshot 2024-07-26 at 22 00 26](https://github.com/user-attachments/assets/20ea5d98-2503-4969-8cdb-82938c706029)

### Does this PR introduce _any_ user-facing change?

No. This is a new feature and disabled by default via `spark.master.rest.enabled (default: false)`

### How was this patch tested?

Pass the CIs with newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47511 from dongjoon-hyun/SPARK-49034-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun added a commit that referenced this pull request Jul 30, 2024
…I server-side env variable replacements

### What changes were proposed in this pull request?

This PR aims to document the following three recent improvements.
- #47491
- #47509
- #47511

### Why are the changes needed?

To provide an updated documentation.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and check the HTML manually.

<img width="926" alt="Screenshot 2024-07-29 at 14 10 40" src="https://github.com/user-attachments/assets/6c904ec0-0ece-432a-8e41-aeb88f7baab8">

<img width="932" alt="Screenshot 2024-07-29 at 13 52 20" src="https://github.com/user-attachments/assets/ca3afe9a-dcfe-4258-b455-9ff4781cb4e5">

<img width="940" alt="Screenshot 2024-07-29 at 13 52 29" src="https://github.com/user-attachments/assets/ad9635d4-c66f-4320-8b93-005443d4df2e">

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47523 from dongjoon-hyun/SPARK-49049.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
fusheng9399 pushed a commit to fusheng9399/spark that referenced this pull request Aug 6, 2024
… in REST Submission API

### What changes were proposed in this pull request?

Like SPARK-49033, this PR aims to support server-side `sparkProperties` replacement in REST Submission API.

- For example, ephemeral Spark clusters with server-side environment variables can provide backend-resource and information without touching client-side applications and configurations.

- The place holder pattern is `{{SERVER_ENVIRONMENT_VARIABLE_NAME}}` style like the following.

https://github.com/apache/spark/blob/163e512c53208301a8511310023d930d8b77db96/docs/configuration.md?plain=1#L694

https://github.com/apache/spark/blob/163e512c53208301a8511310023d930d8b77db96/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L233-L234

### Why are the changes needed?

A user can submits an environment variable holder like `{{AWS_ENDPOINT_URL}}` in order to use server-wide environment variables of Spark Master.

```
$ SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true" \
  AWS_ENDPOINT_URL=ENDPOINT_FOR_THIS_CLUSTER \
  sbin/start-master.sh

$ sbin/start-worker.sh spark://$(hostname):7077
```

```
curl -s -k -XPOST http://localhost:6066/v1/submissions/create \
  --header "Content-Type:application/json;charset=UTF-8" \
  --data '{
          "appResource": "",
          "sparkProperties": {
            "spark.master": "spark://localhost:7077",
            "spark.app.name": "",
            "spark.submit.deployMode": "cluster",
            "spark.hadoop.fs.s3a.endpoint": "{{AWS_ENDPOINT_URL}}",
            "spark.jars": "/Users/dongjoon/APACHE/spark-merge/examples/target/scala-2.13/jars/spark-examples_2.13-4.0.0-SNAPSHOT.jar"
          },
          "clientSparkVersion": "",
          "mainClass": "org.apache.spark.examples.SparkPi",
          "environmentVariables": {},
          "action": "CreateSubmissionRequest",
          "appArgs": [ "10000" ]
  }'
```

- http://localhost:4040/environment/

![Screenshot 2024-07-26 at 22 00 26](https://github.com/user-attachments/assets/20ea5d98-2503-4969-8cdb-82938c706029)

### Does this PR introduce _any_ user-facing change?

No. This is a new feature and disabled by default via `spark.master.rest.enabled (default: false)`

### How was this patch tested?

Pass the CIs with newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47511 from dongjoon-hyun/SPARK-49034-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
fusheng9399 pushed a commit to fusheng9399/spark that referenced this pull request Aug 6, 2024
…I server-side env variable replacements

### What changes were proposed in this pull request?

This PR aims to document the following three recent improvements.
- apache#47491
- apache#47509
- apache#47511

### Why are the changes needed?

To provide an updated documentation.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and check the HTML manually.

<img width="926" alt="Screenshot 2024-07-29 at 14 10 40" src="https://github.com/user-attachments/assets/6c904ec0-0ece-432a-8e41-aeb88f7baab8">

<img width="932" alt="Screenshot 2024-07-29 at 13 52 20" src="https://github.com/user-attachments/assets/ca3afe9a-dcfe-4258-b455-9ff4781cb4e5">

<img width="940" alt="Screenshot 2024-07-29 at 13 52 29" src="https://github.com/user-attachments/assets/ad9635d4-c66f-4320-8b93-005443d4df2e">

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47523 from dongjoon-hyun/SPARK-49049.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Aug 7, 2024
… in REST Submission API

### What changes were proposed in this pull request?

Like SPARK-49033, this PR aims to support server-side `sparkProperties` replacement in REST Submission API.

- For example, ephemeral Spark clusters with server-side environment variables can provide backend-resource and information without touching client-side applications and configurations.

- The place holder pattern is `{{SERVER_ENVIRONMENT_VARIABLE_NAME}}` style like the following.

https://github.com/apache/spark/blob/163e512c53208301a8511310023d930d8b77db96/docs/configuration.md?plain=1#L694

https://github.com/apache/spark/blob/163e512c53208301a8511310023d930d8b77db96/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L233-L234

### Why are the changes needed?

A user can submits an environment variable holder like `{{AWS_ENDPOINT_URL}}` in order to use server-wide environment variables of Spark Master.

```
$ SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true" \
  AWS_ENDPOINT_URL=ENDPOINT_FOR_THIS_CLUSTER \
  sbin/start-master.sh

$ sbin/start-worker.sh spark://$(hostname):7077
```

```
curl -s -k -XPOST http://localhost:6066/v1/submissions/create \
  --header "Content-Type:application/json;charset=UTF-8" \
  --data '{
          "appResource": "",
          "sparkProperties": {
            "spark.master": "spark://localhost:7077",
            "spark.app.name": "",
            "spark.submit.deployMode": "cluster",
            "spark.hadoop.fs.s3a.endpoint": "{{AWS_ENDPOINT_URL}}",
            "spark.jars": "/Users/dongjoon/APACHE/spark-merge/examples/target/scala-2.13/jars/spark-examples_2.13-4.0.0-SNAPSHOT.jar"
          },
          "clientSparkVersion": "",
          "mainClass": "org.apache.spark.examples.SparkPi",
          "environmentVariables": {},
          "action": "CreateSubmissionRequest",
          "appArgs": [ "10000" ]
  }'
```

- http://localhost:4040/environment/

![Screenshot 2024-07-26 at 22 00 26](https://github.com/user-attachments/assets/20ea5d98-2503-4969-8cdb-82938c706029)

### Does this PR introduce _any_ user-facing change?

No. This is a new feature and disabled by default via `spark.master.rest.enabled (default: false)`

### How was this patch tested?

Pass the CIs with newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47511 from dongjoon-hyun/SPARK-49034-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Aug 7, 2024
…I server-side env variable replacements

This PR aims to document the following three recent improvements.
- apache#47491
- apache#47509
- apache#47511

To provide an updated documentation.

No.

Pass the CIs and check the HTML manually.

<img width="926" alt="Screenshot 2024-07-29 at 14 10 40" src="https://github.com/user-attachments/assets/6c904ec0-0ece-432a-8e41-aeb88f7baab8">

<img width="932" alt="Screenshot 2024-07-29 at 13 52 20" src="https://github.com/user-attachments/assets/ca3afe9a-dcfe-4258-b455-9ff4781cb4e5">

<img width="940" alt="Screenshot 2024-07-29 at 13 52 29" src="https://github.com/user-attachments/assets/ad9635d4-c66f-4320-8b93-005443d4df2e">

No.

Closes apache#47523 from dongjoon-hyun/SPARK-49049.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
… in REST Submission API

### What changes were proposed in this pull request?

Like SPARK-49033, this PR aims to support server-side `sparkProperties` replacement in REST Submission API.

- For example, ephemeral Spark clusters with server-side environment variables can provide backend-resource and information without touching client-side applications and configurations.

- The place holder pattern is `{{SERVER_ENVIRONMENT_VARIABLE_NAME}}` style like the following.

https://github.com/apache/spark/blob/163e512c53208301a8511310023d930d8b77db96/docs/configuration.md?plain=1#L694

https://github.com/apache/spark/blob/163e512c53208301a8511310023d930d8b77db96/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L233-L234

### Why are the changes needed?

A user can submits an environment variable holder like `{{AWS_ENDPOINT_URL}}` in order to use server-wide environment variables of Spark Master.

```
$ SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true" \
  AWS_ENDPOINT_URL=ENDPOINT_FOR_THIS_CLUSTER \
  sbin/start-master.sh

$ sbin/start-worker.sh spark://$(hostname):7077
```

```
curl -s -k -XPOST http://localhost:6066/v1/submissions/create \
  --header "Content-Type:application/json;charset=UTF-8" \
  --data '{
          "appResource": "",
          "sparkProperties": {
            "spark.master": "spark://localhost:7077",
            "spark.app.name": "",
            "spark.submit.deployMode": "cluster",
            "spark.hadoop.fs.s3a.endpoint": "{{AWS_ENDPOINT_URL}}",
            "spark.jars": "/Users/dongjoon/APACHE/spark-merge/examples/target/scala-2.13/jars/spark-examples_2.13-4.0.0-SNAPSHOT.jar"
          },
          "clientSparkVersion": "",
          "mainClass": "org.apache.spark.examples.SparkPi",
          "environmentVariables": {},
          "action": "CreateSubmissionRequest",
          "appArgs": [ "10000" ]
  }'
```

- http://localhost:4040/environment/

![Screenshot 2024-07-26 at 22 00 26](https://github.com/user-attachments/assets/20ea5d98-2503-4969-8cdb-82938c706029)

### Does this PR introduce _any_ user-facing change?

No. This is a new feature and disabled by default via `spark.master.rest.enabled (default: false)`

### How was this patch tested?

Pass the CIs with newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47511 from dongjoon-hyun/SPARK-49034-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…I server-side env variable replacements

### What changes were proposed in this pull request?

This PR aims to document the following three recent improvements.
- apache#47491
- apache#47509
- apache#47511

### Why are the changes needed?

To provide an updated documentation.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and check the HTML manually.

<img width="926" alt="Screenshot 2024-07-29 at 14 10 40" src="https://github.com/user-attachments/assets/6c904ec0-0ece-432a-8e41-aeb88f7baab8">

<img width="932" alt="Screenshot 2024-07-29 at 13 52 20" src="https://github.com/user-attachments/assets/ca3afe9a-dcfe-4258-b455-9ff4781cb4e5">

<img width="940" alt="Screenshot 2024-07-29 at 13 52 29" src="https://github.com/user-attachments/assets/ad9635d4-c66f-4320-8b93-005443d4df2e">

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47523 from dongjoon-hyun/SPARK-49049.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
… in REST Submission API

### What changes were proposed in this pull request?

Like SPARK-49033, this PR aims to support server-side `sparkProperties` replacement in REST Submission API.

- For example, ephemeral Spark clusters with server-side environment variables can provide backend-resource and information without touching client-side applications and configurations.

- The place holder pattern is `{{SERVER_ENVIRONMENT_VARIABLE_NAME}}` style like the following.

https://github.com/apache/spark/blob/163e512c53208301a8511310023d930d8b77db96/docs/configuration.md?plain=1#L694

https://github.com/apache/spark/blob/163e512c53208301a8511310023d930d8b77db96/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L233-L234

### Why are the changes needed?

A user can submits an environment variable holder like `{{AWS_ENDPOINT_URL}}` in order to use server-wide environment variables of Spark Master.

```
$ SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true" \
  AWS_ENDPOINT_URL=ENDPOINT_FOR_THIS_CLUSTER \
  sbin/start-master.sh

$ sbin/start-worker.sh spark://$(hostname):7077
```

```
curl -s -k -XPOST http://localhost:6066/v1/submissions/create \
  --header "Content-Type:application/json;charset=UTF-8" \
  --data '{
          "appResource": "",
          "sparkProperties": {
            "spark.master": "spark://localhost:7077",
            "spark.app.name": "",
            "spark.submit.deployMode": "cluster",
            "spark.hadoop.fs.s3a.endpoint": "{{AWS_ENDPOINT_URL}}",
            "spark.jars": "/Users/dongjoon/APACHE/spark-merge/examples/target/scala-2.13/jars/spark-examples_2.13-4.0.0-SNAPSHOT.jar"
          },
          "clientSparkVersion": "",
          "mainClass": "org.apache.spark.examples.SparkPi",
          "environmentVariables": {},
          "action": "CreateSubmissionRequest",
          "appArgs": [ "10000" ]
  }'
```

- http://localhost:4040/environment/

![Screenshot 2024-07-26 at 22 00 26](https://github.com/user-attachments/assets/20ea5d98-2503-4969-8cdb-82938c706029)

### Does this PR introduce _any_ user-facing change?

No. This is a new feature and disabled by default via `spark.master.rest.enabled (default: false)`

### How was this patch tested?

Pass the CIs with newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47511 from dongjoon-hyun/SPARK-49034-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…I server-side env variable replacements

### What changes were proposed in this pull request?

This PR aims to document the following three recent improvements.
- apache#47491
- apache#47509
- apache#47511

### Why are the changes needed?

To provide an updated documentation.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and check the HTML manually.

<img width="926" alt="Screenshot 2024-07-29 at 14 10 40" src="https://github.com/user-attachments/assets/6c904ec0-0ece-432a-8e41-aeb88f7baab8">

<img width="932" alt="Screenshot 2024-07-29 at 13 52 20" src="https://github.com/user-attachments/assets/ca3afe9a-dcfe-4258-b455-9ff4781cb4e5">

<img width="940" alt="Screenshot 2024-07-29 at 13 52 29" src="https://github.com/user-attachments/assets/ad9635d4-c66f-4320-8b93-005443d4df2e">

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47523 from dongjoon-hyun/SPARK-49049.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants