Skip to content

Commit

Permalink
[SPARK-50997][DOCS] Remove \t character in docs
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR aims to remove `\t` characters in `docs`.

### Why are the changes needed?

This is a clean-up in order to be consistent in `docs`.

### Does this PR introduce _any_ user-facing change?

No, these are white-space character changes in docs.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49682 from dongjoon-hyun/SPARK-50997.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
  • Loading branch information
dongjoon-hyun committed Jan 27, 2025
1 parent 9da1cd0 commit 77096a2
Show file tree
Hide file tree
Showing 10 changed files with 61 additions and 61 deletions.
14 changes: 7 additions & 7 deletions docs/mllib-decision-tree.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,19 +58,19 @@ impurity measure for regression (variance).
<tbody>
<tr>
<td>Gini impurity</td>
<td>Classification</td>
<td>$\sum_{i=1}^{C} f_i(1-f_i)$</td><td>$f_i$ is the frequency of label $i$ at a node and $C$ is the number of unique labels.</td>
<td>Classification</td>
<td>$\sum_{i=1}^{C} f_i(1-f_i)$</td><td>$f_i$ is the frequency of label $i$ at a node and $C$ is the number of unique labels.</td>
</tr>
<tr>
<td>Entropy</td>
<td>Classification</td>
<td>$\sum_{i=1}^{C} -f_ilog(f_i)$</td><td>$f_i$ is the frequency of label $i$ at a node and $C$ is the number of unique labels.</td>
<td>Classification</td>
<td>$\sum_{i=1}^{C} -f_ilog(f_i)$</td><td>$f_i$ is the frequency of label $i$ at a node and $C$ is the number of unique labels.</td>
</tr>
<tr>
<td>Variance</td>
<td>Regression</td>
<td>$\frac{1}{N} \sum_{i=1}^{N} (y_i - \mu)^2$</td><td>$y_i$ is label for an instance,
$N$ is the number of instances and $\mu$ is the mean given by $\frac{1}{N} \sum_{i=1}^N y_i$.</td>
<td>Regression</td>
<td>$\frac{1}{N} \sum_{i=1}^{N} (y_i - \mu)^2$</td><td>$y_i$ is label for an instance,
$N$ is the number of instances and $\mu$ is the mean given by $\frac{1}{N} \sum_{i=1}^N y_i$.</td>
</tr>
</tbody>
</table>
Expand Down
12 changes: 6 additions & 6 deletions docs/mllib-ensembles.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,18 +198,18 @@ Notation: $N$ = number of instances. $y_i$ = label of instance $i$. $x_i$ = fea
<tbody>
<tr>
<td>Log Loss</td>
<td>Classification</td>
<td>$2 \sum_{i=1}^{N} \log(1+\exp(-2 y_i F(x_i)))$</td><td>Twice binomial negative log likelihood.</td>
<td>Classification</td>
<td>$2 \sum_{i=1}^{N} \log(1+\exp(-2 y_i F(x_i)))$</td><td>Twice binomial negative log likelihood.</td>
</tr>
<tr>
<td>Squared Error</td>
<td>Regression</td>
<td>$\sum_{i=1}^{N} (y_i - F(x_i))^2$</td><td>Also called L2 loss. Default loss for regression tasks.</td>
<td>Regression</td>
<td>$\sum_{i=1}^{N} (y_i - F(x_i))^2$</td><td>Also called L2 loss. Default loss for regression tasks.</td>
</tr>
<tr>
<td>Absolute Error</td>
<td>Regression</td>
<td>$\sum_{i=1}^{N} |y_i - F(x_i)|$</td><td>Also called L1 loss. Can be more robust to outliers than Squared Error.</td>
<td>Regression</td>
<td>$\sum_{i=1}^{N} |y_i - F(x_i)|$</td><td>Also called L1 loss. Can be more robust to outliers than Squared Error.</td>
</tr>
</tbody>
</table>
Expand Down
2 changes: 1 addition & 1 deletion docs/running-on-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -1470,7 +1470,7 @@ See the [configuration page](configuration.html) for information on Spark config
<td><code>spark.kubernetes.executor.scheduler.name</code></td>
<td>(none)</td>
<td>
Specify the scheduler name for each executor pod.
Specify the scheduler name for each executor pod.
</td>
<td>3.0.0</td>
</tr>
Expand Down
4 changes: 2 additions & 2 deletions docs/sql-ref-ansi-compliance.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,8 +281,8 @@ Note, arithmetic operations have special rules to calculate the least common typ
| Operation | Result precision | Result scale |
|------------|------------------------------------------|---------------------|
| e1 + e2 | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2) |
| e1 - e2 | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2) |
| e1 * e2 | p1 + p2 + 1 | s1 + s2 |
| e1 - e2 | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2) |
| e1 * e2 | p1 + p2 + 1 | s1 + s2 |
| e1 / e2 | p1 - s1 + s2 + max(6, s1 + p2 + 1) | max(6, s1 + p2 + 1) |
| e1 % e2 | min(p1 - s1, p2 - s2) + max(s1, s2) | max(s1, s2) |

Expand Down
22 changes: 11 additions & 11 deletions docs/sql-ref-syntax-qry-select-aggregate.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,23 +94,23 @@ SELECT * FROM basic_pays;
+-----------------+----------+------+
| employee_name|department|salary|
+-----------------+----------+------+
| Anthony Bow|Accounting| 6627|
| Anthony Bow|Accounting| 6627|
| Barry Jones| SCM| 10586|
| Diane Murphy|Accounting| 8435|
| Foon Yue Tseng| Sales| 6660|
| Diane Murphy|Accounting| 8435|
| Foon Yue Tseng| Sales| 6660|
| George Vanauf| Sales| 10563|
| Gerard Bondur|Accounting| 11472|
| Gerard Hernandez| SCM| 6949|
| Jeff Firrelli|Accounting| 8992|
| Julie Firrelli| Sales| 9181|
| Gerard Hernandez| SCM| 6949|
| Jeff Firrelli|Accounting| 8992|
| Julie Firrelli| Sales| 9181|
| Larry Bott| SCM| 11798|
| Leslie Jennings| IT| 8113|
| Leslie Thompson| IT| 5186|
| Leslie Jennings| IT| 8113|
| Leslie Thompson| IT| 5186|
| Loui Bondur| SCM| 10449|
| Mary Patterson|Accounting| 9998|
| Mary Patterson|Accounting| 9998|
| Pamela Castillo| SCM| 11303|
| Steve Patterson| Sales| 9441|
|William Patterson|Accounting| 8870|
| Steve Patterson| Sales| 9441|
|William Patterson|Accounting| 8870|
+-----------------+----------+------+

SELECT
Expand Down
22 changes: 11 additions & 11 deletions docs/sql-ref-syntax-qry-select-transform.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,17 +238,17 @@ SELECT TRANSFORM(zip_code, name, age)
USING 'cat'
FROM person
WHERE zip_code > 94500;
+-------+---------------------+
| key| value|
+-------+---------------------+
| 94588| Anil K 27|
| 94588| John V \N|
| 94511| Aryan B. 18|
| 94511| David K 42|
| 94588| Zen Hui 50|
| 94588| Dan Li 18|
| 94511| Lalit B. \N|
+-------+---------------------+
+-------+----------------+
| key| value|
+-------+----------------+
| 94588| Anil K 27|
| 94588| John V \N|
| 94511| Aryan B. 18|
| 94511| David K 42|
| 94588| Zen Hui 50|
| 94588| Dan Li 18|
| 94511| Lalit B. \N|
+-------+----------------+
```

### Related Statements
Expand Down
6 changes: 3 additions & 3 deletions docs/streaming-kafka-0-10-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ there are notable differences in usage.
### Linking
For Scala/Java applications using SBT/Maven project definitions, link your streaming application with the following artifact (see [Linking section](streaming-programming-guide.html#linking) in the main programming guide for further information).

groupId = org.apache.spark
artifactId = spark-streaming-kafka-0-10_{{site.SCALA_BINARY_VERSION}}
version = {{site.SPARK_VERSION_SHORT}}
groupId = org.apache.spark
artifactId = spark-streaming-kafka-0-10_{{site.SCALA_BINARY_VERSION}}
version = {{site.SPARK_VERSION_SHORT}}

**Do not** manually add dependencies on `org.apache.kafka` artifacts (e.g. `kafka-clients`). The `spark-streaming-kafka-0-10` artifact has the appropriate transitive dependencies already, and different versions may be incompatible in hard to diagnose ways.

Expand Down
6 changes: 3 additions & 3 deletions docs/streaming-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,7 @@ Similar to Spark, Spark Streaming is available through Maven Central. To write y
<div class="codetabs">
<div data-lang="Maven" markdown="1">

<dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_{{site.SCALA_BINARY_VERSION}}</artifactId>
<version>{{site.SPARK_VERSION}}</version>
Expand All @@ -423,7 +423,7 @@ Similar to Spark, Spark Streaming is available through Maven Central. To write y
</div>
<div data-lang="SBT" markdown="1">

libraryDependencies += "org.apache.spark" % "spark-streaming_{{site.SCALA_BINARY_VERSION}}" % "{{site.SPARK_VERSION}}" % "provided"
libraryDependencies += "org.apache.spark" % "spark-streaming_{{site.SCALA_BINARY_VERSION}}" % "{{site.SPARK_VERSION}}" % "provided"
</div>
</div>

Expand Down Expand Up @@ -2191,7 +2191,7 @@ improve the performance of your application. At a high level, you need to consid
1. Reducing the processing time of each batch of data by efficiently using cluster resources.

2. Setting the right batch size such that the batches of data can be processed as fast as they
are received (that is, data processing keeps up with the data ingestion).
are received (that is, data processing keeps up with the data ingestion).

## Reducing the Batch Processing Times
There are a number of optimizations that can be done in Spark to minimize the processing time of
Expand Down
4 changes: 2 additions & 2 deletions docs/streaming/performance-tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,9 @@ val stream = spark.readStream
.load()
val query = stream.writeStream
.format("kafka")
.option("topic", "out")
.option("topic", "out")
.option("checkpointLocation", "/tmp/checkpoint")
.option("asyncProgressTrackingEnabled", "true")
.option("asyncProgressTrackingEnabled", "true")
.start()
```

Expand Down
30 changes: 15 additions & 15 deletions docs/web-ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,15 +80,15 @@ This page displays the details of a specific job identified by its job ID.
</p>

* List of stages (grouped by state active, pending, completed, skipped, and failed)
* Stage ID
* Description of the stage
* Submitted timestamp
* Duration of the stage
* Tasks progress bar
* Input: Bytes read from storage in this stage
* Output: Bytes written in storage in this stage
* Shuffle read: Total shuffle bytes and records read, includes both data read locally and data read from remote executors
* Shuffle write: Bytes and records written to disk in order to be read by a shuffle in a future stage
* Stage ID
* Description of the stage
* Submitted timestamp
* Duration of the stage
* Tasks progress bar
* Input: Bytes read from storage in this stage
* Output: Bytes written in storage in this stage
* Shuffle read: Total shuffle bytes and records read, includes both data read locally and data read from remote executors
* Shuffle write: Bytes and records written to disk in order to be read by a shuffle in a future stage

<p style="text-align: center;">
<img src="img/JobPageDetail3.png" title="DAG" alt="DAG">
Expand Down Expand Up @@ -479,12 +479,12 @@ The third section has the SQL statistics of the submitted operations.
* **Duration time** is the difference between close time and start time.
* **Statement** is the operation being executed.
* **State** of the process.
* _Started_, first state, when the process begins.
* _Compiled_, execution plan generated.
* _Failed_, final state when the execution failed or finished with error.
* _Canceled_, final state when the execution is canceled.
* _Finished_ processing and waiting to fetch results.
* _Closed_, final state when client closed the statement.
* _Started_, first state, when the process begins.
* _Compiled_, execution plan generated.
* _Failed_, final state when the execution failed or finished with error.
* _Canceled_, final state when the execution is canceled.
* _Finished_ processing and waiting to fetch results.
* _Closed_, final state when client closed the statement.
* **Detail** of the execution plan with parsed logical plan, analyzed logical plan, optimized logical plan and physical plan or errors in the SQL statement.

<p style="text-align: center;">
Expand Down

0 comments on commit 77096a2

Please sign in to comment.