Skip to content
This repository has been archived by the owner on Nov 15, 2024. It is now read-only.

Commit

Permalink
[SPARK-21267][SS][DOCS] Update Structured Streaming Documentation
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

Few changes to the Structured Streaming documentation
- Clarify that the entire stream input table is not materialized
- Add information for Ganglia
- Add Kafka Sink to the main docs
- Removed a couple of leftover experimental tags
- Added more associated reading material and talk videos.

In addition, apache#16856 broke the link to the RDD programming guide in several places while renaming the page. This PR fixes those sameeragarwal cloud-fan.
- Added a redirection to avoid breaking internal and possible external links.
- Removed unnecessary redirection pages that were there since the separate scala, java, and python programming guides were merged together in 2013 or 2014.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes apache#18485 from tdas/SPARK-21267.

(cherry picked from commit 0217dfd)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
  • Loading branch information
tdas authored and MatthewRBruce committed Jul 31, 2018
1 parent 1cf241a commit 72f0634
Show file tree
Hide file tree
Showing 10 changed files with 169 additions and 72 deletions.
7 changes: 3 additions & 4 deletions docs/_layouts/global.html
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,10 @@
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Programming Guides<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="quick-start.html">Quick Start</a></li>
<li><a href="programming-guide.html">Spark Programming Guide</a></li>
<li class="divider"></li>
<li><a href="streaming-programming-guide.html">Spark Streaming</a></li>
<li><a href="sql-programming-guide.html">DataFrames, Datasets and SQL</a></li>
<li><a href="rdd-programming-guide.html">RDDs, Accumulators, Broadcasts Vars</a></li>
<li><a href="sql-programming-guide.html">SQL, DataFrames, and Datasets</a></li>
<li><a href="structured-streaming-programming-guide.html">Structured Streaming</a></li>
<li><a href="streaming-programming-guide.html">Spark Streaming (DStreams)</a></li>
<li><a href="ml-guide.html">MLlib (Machine Learning)</a></li>
<li><a href="graphx-programming-guide.html">GraphX (Graph Processing)</a></li>
<li><a href="sparkr.html">SparkR (R on Spark)</a></li>
Expand Down
13 changes: 6 additions & 7 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,13 +88,12 @@ options for deployment:
**Programming Guides:**

* [Quick Start](quick-start.html): a quick introduction to the Spark API; start here!
* [Spark Programming Guide](programming-guide.html): detailed overview of Spark
in all supported languages (Scala, Java, Python, R)
* Modules built on Spark:
* [Spark Streaming](streaming-programming-guide.html): processing real-time data streams
* [Spark SQL, Datasets, and DataFrames](sql-programming-guide.html): support for structured data and relational queries
* [MLlib](ml-guide.html): built-in machine learning library
* [GraphX](graphx-programming-guide.html): Spark's new API for graph processing
* [RDD Programming Guide](programming-guide.html): overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables
* [Spark SQL, Datasets, and DataFrames](sql-programming-guide.html): processing structured data with relational queries (newer API than RDDs)
* [Structured Streaming](structured-streaming-programming-guide.html): processing structured data streams with relation queries (using Datasets and DataFrames, newer API than DStreams)
* [Spark Streaming](streaming-programming-guide.html): processing data streams using DStreams (old API)
* [MLlib](ml-guide.html): applying machine learning algorithms
* [GraphX](graphx-programming-guide.html): processing graphs

**API Docs:**

Expand Down
7 changes: 0 additions & 7 deletions docs/java-programming-guide.md

This file was deleted.

7 changes: 7 additions & 0 deletions docs/programming-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
layout: global
title: Spark Programming Guide
redirect: rdd-programming-guide.html
---

This document has moved [here](rdd-programming-guide.html).
7 changes: 0 additions & 7 deletions docs/python-programming-guide.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/rdd-programming-guide.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: global
title: Spark Programming Guide
title: RDD Programming Guide
description: Spark SPARK_VERSION_SHORT programming guide in Java, Scala and Python
---

Expand Down
7 changes: 0 additions & 7 deletions docs/scala-programming-guide.md

This file was deleted.

16 changes: 3 additions & 13 deletions docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -392,41 +392,31 @@ While those functions are designed for DataFrames, Spark SQL also has type-safe
Moreover, users are not limited to the predefined aggregate functions and can create their own.

### Untyped User-Defined Aggregate Functions

<div class="codetabs">

<div data-lang="scala" markdown="1">

Users have to extend the [UserDefinedAggregateFunction](api/scala/index.html#org.apache.spark.sql.expressions.UserDefinedAggregateFunction)
abstract class to implement a custom untyped aggregate function. For example, a user-defined average
can look like:

<div class="codetabs">
<div data-lang="scala" markdown="1">
{% include_example untyped_custom_aggregation scala/org/apache/spark/examples/sql/UserDefinedUntypedAggregation.scala%}
</div>

<div data-lang="java" markdown="1">

{% include_example untyped_custom_aggregation java/org/apache/spark/examples/sql/JavaUserDefinedUntypedAggregation.java%}
</div>

</div>

### Type-Safe User-Defined Aggregate Functions

User-defined aggregations for strongly typed Datasets revolve around the [Aggregator](api/scala/index.html#org.apache.spark.sql.expressions.Aggregator) abstract class.
For example, a type-safe user-defined average can look like:
<div class="codetabs">

<div class="codetabs">
<div data-lang="scala" markdown="1">

{% include_example typed_custom_aggregation scala/org/apache/spark/examples/sql/UserDefinedTypedAggregation.scala%}
</div>

<div data-lang="java" markdown="1">

{% include_example typed_custom_aggregation java/org/apache/spark/examples/sql/JavaUserDefinedTypedAggregation.java%}
</div>

</div>

# Data Sources
Expand Down
Loading

0 comments on commit 72f0634

Please sign in to comment.