Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

Spark checkpoint docs #181

Merged
merged 1 commit into from
Oct 5, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion docs/hdfs.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,16 @@ For more information, see [Inheriting Hadoop Cluster Configuration][8].

For DC/OS HDFS, these configuration files are served at `http://<hdfs.framework-name>.marathon.mesos:<port>/v1/connection`, where `<hdfs.framework-name>` is a configuration variable set in the HDFS package, and `<port>` is the port of its marathon app.

### Spark Checkpointing

In order to use spark with checkpointing make sure you follow the instructions [here](https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing) and use an hdfs directory as the checkpointing directory. For example:
```
val checkpointDirectory = "hdfs://hdfs/checkpoint"
val ssc = ...
ssc.checkpoint(checkpointDirectory)
```
That hdfs directory will be automatically created on hdfs and spark streaming app will work from checkpointed data even in the presence of application restarts/failures.

# HDFS Kerberos

You can access external (i.e. non-DC/OS) Kerberos-secured HDFS clusters from Spark on Mesos.
Expand Down Expand Up @@ -87,7 +97,7 @@ Submit the job with the ticket:
```$bash
dcos spark run --submit-args="\
--kerberos-principal hdfs/name-0-node.hdfs.autoip.dcos.thisdcos.directory@LOCAL \
--tgt-secret-path /__dcos_base64__tgt
--tgt-secret-path /__dcos_base64__tgt \
--conf ... --class MySparkJob <url> <args>"
```

Expand Down