From 18a71332f27465f961a7a45139b13b488768703f Mon Sep 17 00:00:00 2001
From: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
Date: Fri, 8 Sep 2017 15:01:12 +0300
Subject: [PATCH] checkpoint

---
 docs/hdfs.md | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/docs/hdfs.md b/docs/hdfs.md
index 4b513466..c08fff33 100644
--- a/docs/hdfs.md
+++ b/docs/hdfs.md
@@ -21,6 +21,15 @@ For more information, see [Inheriting Hadoop Cluster Configuration][8].
 
 For DC/OS HDFS, these configuration files are served at `http://<hdfs.framework-name>.marathon.mesos:<port>/v1/connection`, where `<hdfs.framework-name>` is a configuration variable set in the HDFS package, and `<port>` is the port of its marathon app.
 
+### Spark Checkpointing
+
+In order to use spark with checkpointing make sure you follow the instructions [here](https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing) and use an hdfs directory as the checkpoint directory. In your spark application code use the hdfs directory of your choice, for example:
+```
+val ssc = ...
+ssc.checkpoint(checkpointDirectory)
+```
+That directory will be automatically created on hdfs and spark streaming app will work from checkpointed data even in the presence of application restarts.
+
 # HDFS Kerberos
 
 You can access external (i.e. non-DC/OS) Kerberos-secured HDFS clusters from Spark on Mesos.
@@ -87,7 +96,7 @@ Submit the job with the ticket:
 ```$bash
     dcos spark run --submit-args="\
     --kerberos-principal hdfs/name-0-node.hdfs.autoip.dcos.thisdcos.directory@LOCAL \
-    --tgt-secret-path /__dcos_base64__tgt 
+    --tgt-secret-path /__dcos_base64__tgt
     --conf ... --class MySparkJob <url> <args>"
 ```