diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index e7a166c3014c1..5d6f9c042248b 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -593,7 +593,7 @@ setMethod("cache", #' #' Persist this SparkDataFrame with the specified storage level. For details of the #' supported storage levels, refer to -#' \url{http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence}. +#' \url{http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence}. #' #' @param x the SparkDataFrame to persist. #' @param newLevel storage level chosen for the persistance. See available options in diff --git a/R/pkg/R/RDD.R b/R/pkg/R/RDD.R index 7ad3993e9ecbc..15ca212acf87f 100644 --- a/R/pkg/R/RDD.R +++ b/R/pkg/R/RDD.R @@ -227,7 +227,7 @@ setMethod("cacheRDD", #' #' Persist this RDD with the specified storage level. For details of the #' supported storage levels, refer to -#'\url{http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence}. +#'\url{http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence}. #' #' @param x The RDD to persist #' @param newLevel The new storage level to be assigned diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index 76aa7b405e18c..46225dc598da8 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -27,7 +27,7 @@ description: GraphX graph processing library guide for Spark SPARK_VERSION_SHORT [EdgeContext]: api/scala/index.html#org.apache.spark.graphx.EdgeContext [GraphOps.collectNeighborIds]: api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighborIds(EdgeDirection):VertexRDD[Array[VertexId]] [GraphOps.collectNeighbors]: api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighbors(EdgeDirection):VertexRDD[Array[(VertexId,VD)]] -[RDD Persistence]: programming-guide.html#rdd-persistence +[RDD Persistence]: rdd-programming-guide.html#rdd-persistence [Graph.cache]: api/scala/index.html#org.apache.spark.graphx.Graph@cache():Graph[VD,ED] [GraphOps.pregel]: api/scala/index.html#org.apache.spark.graphx.GraphOps@pregel[A](A,Int,EdgeDirection)((VertexId,VD,A)⇒VD,(EdgeTriplet[VD,ED])⇒Iterator[(VertexId,A)],(A,A)⇒A)(ClassTag[A]):Graph[VD,ED] [PartitionStrategy]: api/scala/index.html#org.apache.spark.graphx.PartitionStrategy$ diff --git a/docs/index.md b/docs/index.md index 07b6b171014ed..2d4607b3119bd 100644 --- a/docs/index.md +++ b/docs/index.md @@ -87,7 +87,7 @@ options for deployment: **Programming Guides:** * [Quick Start](quick-start.html): a quick introduction to the Spark API; start here! -* [RDD Programming Guide](programming-guide.html): overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables +* [RDD Programming Guide](rdd-programming-guide.html): overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables * [Spark SQL, Datasets, and DataFrames](sql-programming-guide.html): processing structured data with relational queries (newer API than RDDs) * [Structured Streaming](structured-streaming-programming-guide.html): processing structured data streams with relation queries (using Datasets and DataFrames, newer API than DStreams) * [Spark Streaming](streaming-programming-guide.html): processing data streams using DStreams (old API) diff --git a/docs/ml-guide.md b/docs/ml-guide.md index adb1c9aaefcdc..7aec6a40d4c64 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -18,7 +18,7 @@ At a high level, it provides tools such as: **The MLlib RDD-based API is now in maintenance mode.** -As of Spark 2.0, the [RDD](programming-guide.html#resilient-distributed-datasets-rdds)-based APIs in the `spark.mllib` package have entered maintenance mode. +As of Spark 2.0, the [RDD](rdd-programming-guide.html#resilient-distributed-datasets-rdds)-based APIs in the `spark.mllib` package have entered maintenance mode. The primary Machine Learning API for Spark is now the [DataFrame](sql-programming-guide.html)-based API in the `spark.ml` package. *What are the implications?* diff --git a/docs/mllib-optimization.md b/docs/mllib-optimization.md index eefd7dcf1108b..14d76a6e41e23 100644 --- a/docs/mllib-optimization.md +++ b/docs/mllib-optimization.md @@ -116,7 +116,7 @@ is a stochastic gradient. Here `$S$` is the sampled subset of size `$|S|=$ miniB $\cdot n$`. In each iteration, the sampling over the distributed dataset -([RDD](programming-guide.html#resilient-distributed-datasets-rdds)), as well as the +([RDD](rdd-programming-guide.html#resilient-distributed-datasets-rdds)), as well as the computation of the sum of the partial results from each worker machine is performed by the standard spark routines. diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index edefbef93feb6..642575b46dd42 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -264,7 +264,7 @@ SPARK_WORKER_OPTS supports the following system properties: # Connecting an Application to the Cluster To run an application on the Spark cluster, simply pass the `spark://IP:PORT` URL of the master as to the [`SparkContext` -constructor](programming-guide.html#initializing-spark). +constructor](rdd-programming-guide.html#initializing-spark). To run an interactive Spark shell against the cluster, run the following command: diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index abd4ac9653606..fca0cf8ff05f2 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -535,7 +535,7 @@ After a context is defined, you have to do the following. It represents a continuous stream of data, either the input data stream received from source, or the processed data stream generated by transforming the input stream. Internally, a DStream is represented by a continuous series of RDDs, which is Spark's abstraction of an immutable, -distributed dataset (see [Spark Programming Guide](programming-guide.html#resilient-distributed-datasets-rdds) for more details). Each RDD in a DStream contains data from a certain interval, +distributed dataset (see [Spark Programming Guide](rdd-programming-guide.html#resilient-distributed-datasets-rdds) for more details). Each RDD in a DStream contains data from a certain interval, as shown in the following figure.
@@ -1531,7 +1531,7 @@ default persistence level is set to replicate the data to two nodes for fault-to Note that, unlike RDDs, the default persistence level of DStreams keeps the data serialized in memory. This is further discussed in the [Performance Tuning](#memory-tuning) section. More -information on different persistence levels can be found in the [Spark Programming Guide](programming-guide.html#rdd-persistence). +information on different persistence levels can be found in the [Spark Programming Guide](rdd-programming-guide.html#rdd-persistence). *** @@ -1720,7 +1720,13 @@ batch interval that is at least 10 seconds. It can be set by using ## Accumulators, Broadcast Variables, and Checkpoints -[Accumulators](programming-guide.html#accumulators) and [Broadcast variables](programming-guide.html#broadcast-variables) cannot be recovered from checkpoint in Spark Streaming. If you enable checkpointing and use [Accumulators](programming-guide.html#accumulators) or [Broadcast variables](programming-guide.html#broadcast-variables) as well, you'll have to create lazily instantiated singleton instances for [Accumulators](programming-guide.html#accumulators) and [Broadcast variables](programming-guide.html#broadcast-variables) so that they can be re-instantiated after the driver restarts on failure. This is shown in the following example. +[Accumulators](rdd-programming-guide.html#accumulators) and [Broadcast variables](rdd-programming-guide.html#broadcast-variables) +cannot be recovered from checkpoint in Spark Streaming. If you enable checkpointing and use +[Accumulators](rdd-programming-guide.html#accumulators) or [Broadcast variables](rdd-programming-guide.html#broadcast-variables) +as well, you'll have to create lazily instantiated singleton instances for +[Accumulators](rdd-programming-guide.html#accumulators) and [Broadcast variables](rdd-programming-guide.html#broadcast-variables) +so that they can be re-instantiated after the driver restarts on failure. +This is shown in the following example.