diff --git a/dev/run-tests b/dev/run-tests
index a5dcacb4fd0c1..00038718335a1 100755
--- a/dev/run-tests
+++ b/dev/run-tests
@@ -30,7 +30,7 @@ set -e
echo "========================================================================="
echo "Running Spark unit tests"
echo "========================================================================="
-sbt/sbt assembly test
+sbt/sbt clean assembly test
echo "========================================================================="
echo "Running PySpark tests"
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index 4985c52a11ada..f9904d45013f6 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -58,11 +58,21 @@ do is as follows.
+First, we import the names of the Spark Streaming classes, and some implicit
+conversions from StreamingContext into our environment, to add useful methods to
+other classes we need (like DStream).
-First, we create a
-[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object,
-which is the main entry point for all streaming
-functionality. Besides Spark's configuration, we specify that any DStream will be processed
+[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) is the
+main entry point for all streaming functionality.
+
+{% highlight scala %}
+import org.apache.spark.streaming._
+import org.apache.spark.streaming.StreamingContext._
+{% endhighlight %}
+
+Then we create a
+[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object.
+Besides Spark's configuration, we specify that any DStream will be processed
in 1 second batches.
{% highlight scala %}
@@ -88,7 +98,7 @@ val words = lines.flatMap(_.split(" "))
{% endhighlight %}
`flatMap` is a one-to-many DStream operation that creates a new DStream by
-generating multiple new records from each record int the source DStream. In this case,
+generating multiple new records from each record in the source DStream. In this case,
each line will be split into multiple words and the stream of words is represented as the
`words` DStream. Next, we want to count these words.
@@ -98,7 +108,7 @@ val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
// Print a few of the counts to the console
-wordCount.print()
+wordCounts.print()
{% endhighlight %}
The `words` DStream is further mapped (one-to-one transformation) to a DStream of `(word,
@@ -178,7 +188,7 @@ JavaPairDStream wordCounts = pairs.reduceByKey(
return i1 + i2;
}
});
-wordCount.print(); // Print a few of the counts to the console
+wordCounts.print(); // Print a few of the counts to the console
{% endhighlight %}
The `words` DStream is further mapped (one-to-one transformation) to a DStream of `(word,
@@ -262,6 +272,24 @@ Time: 1357008430000 ms
+If you plan to run the Scala code for Spark Streaming-based use cases in the Spark
+shell, you should start the shell with the SparkConfiguration pre-configured to
+discard old batches periodically:
+
+{% highlight bash %}
+$ SPARK_JAVA_OPTS=-Dspark.cleaner.ttl=10000 bin/spark-shell
+{% endhighlight %}
+
+... and create your StreamingContext by wrapping the existing interactive shell
+SparkContext object, `sc`:
+
+{% highlight scala %}
+val ssc = new StreamingContext(sc, Seconds(1))
+{% endhighlight %}
+
+When working with the shell, you may also need to send a `^D` to your netcat session
+to force the pipeline to print the word counts to the console at the sink.
+
***************************************************************************************************
# Basics
@@ -428,9 +456,9 @@ KafkaUtils.createStream(javaStreamingContext, kafkaParams, ...);
-For more details on these additional sources, see the corresponding [API documentation]
-(#where-to-go-from-here). Furthermore, you can also implement your own custom receiver
-for your sources. See the [Custom Receiver Guide](streaming-custom-receivers.html).
+For more details on these additional sources, see the corresponding [API documentation](#where-to-go-from-here).
+Furthermore, you can also implement your own custom receiver for your sources. See the
+[Custom Receiver Guide](streaming-custom-receivers.html).
## Operations
There are two kinds of DStream operations - _transformations_ and _output operations_. Similar to
@@ -511,7 +539,7 @@ common ones are as follows.