From 18a71332f27465f961a7a45139b13b488768703f Mon Sep 17 00:00:00 2001 From: Stavros Kontopoulos Date: Fri, 8 Sep 2017 15:01:12 +0300 Subject: [PATCH] checkpoint --- docs/hdfs.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/hdfs.md b/docs/hdfs.md index 4b513466..c08fff33 100644 --- a/docs/hdfs.md +++ b/docs/hdfs.md @@ -21,6 +21,15 @@ For more information, see [Inheriting Hadoop Cluster Configuration][8]. For DC/OS HDFS, these configuration files are served at `http://.marathon.mesos:/v1/connection`, where `` is a configuration variable set in the HDFS package, and `` is the port of its marathon app. +### Spark Checkpointing + +In order to use spark with checkpointing make sure you follow the instructions [here](https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing) and use an hdfs directory as the checkpoint directory. In your spark application code use the hdfs directory of your choice, for example: +``` +val ssc = ... +ssc.checkpoint(checkpointDirectory) +``` +That directory will be automatically created on hdfs and spark streaming app will work from checkpointed data even in the presence of application restarts. + # HDFS Kerberos You can access external (i.e. non-DC/OS) Kerberos-secured HDFS clusters from Spark on Mesos. @@ -87,7 +96,7 @@ Submit the job with the ticket: ```$bash dcos spark run --submit-args="\ --kerberos-principal hdfs/name-0-node.hdfs.autoip.dcos.thisdcos.directory@LOCAL \ - --tgt-secret-path /__dcos_base64__tgt + --tgt-secret-path /__dcos_base64__tgt --conf ... --class MySparkJob " ```