Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

Commit

Permalink
checkpoint
Browse files Browse the repository at this point in the history
  • Loading branch information
Stavros Kontopoulos committed Sep 29, 2017
1 parent de1c275 commit 18a7133
Showing 1 changed file with 10 additions and 1 deletion.
11 changes: 10 additions & 1 deletion docs/hdfs.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,15 @@ For more information, see [Inheriting Hadoop Cluster Configuration][8].

For DC/OS HDFS, these configuration files are served at `http://<hdfs.framework-name>.marathon.mesos:<port>/v1/connection`, where `<hdfs.framework-name>` is a configuration variable set in the HDFS package, and `<port>` is the port of its marathon app.

### Spark Checkpointing

In order to use spark with checkpointing make sure you follow the instructions [here](https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing) and use an hdfs directory as the checkpoint directory. In your spark application code use the hdfs directory of your choice, for example:
```
val ssc = ...
ssc.checkpoint(checkpointDirectory)
```
That directory will be automatically created on hdfs and spark streaming app will work from checkpointed data even in the presence of application restarts.

# HDFS Kerberos

You can access external (i.e. non-DC/OS) Kerberos-secured HDFS clusters from Spark on Mesos.
Expand Down Expand Up @@ -87,7 +96,7 @@ Submit the job with the ticket:
```$bash
dcos spark run --submit-args="\
--kerberos-principal hdfs/name-0-node.hdfs.autoip.dcos.thisdcos.directory@LOCAL \
--tgt-secret-path /__dcos_base64__tgt
--tgt-secret-path /__dcos_base64__tgt
--conf ... --class MySparkJob <url> <args>"
```

Expand Down

0 comments on commit 18a7133

Please sign in to comment.