[SPARK-50997][DOCS] Remove \t character in docs

### What changes were proposed in this pull request? This PR aims to remove `\t` characters in `docs`. ### Why are the changes needed? This is a clean-up in order to be consistent in `docs`. ### Does this PR introduce _any_ user-facing change? No, these are white-space character changes in docs. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49682 from dongjoon-hyun/SPARK-50997. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
apache · Jan 27, 2025 · 77096a2 · 77096a2
1 parent 9da1cd0
commit 77096a2
Show file tree

Hide file tree

Showing 10 changed files with 61 additions and 61 deletions.
diff --git a/docs/mllib-decision-tree.md b/docs/mllib-decision-tree.md
@@ -58,19 +58,19 @@ impurity measure for regression (variance).
   <tbody>
     <tr>
       <td>Gini impurity</td>
-	  <td>Classification</td>
-	  <td>$\sum_{i=1}^{C} f_i(1-f_i)$</td><td>$f_i$ is the frequency of label $i$ at a node and $C$ is the number of unique labels.</td>
+      <td>Classification</td>
+      <td>$\sum_{i=1}^{C} f_i(1-f_i)$</td><td>$f_i$ is the frequency of label $i$ at a node and $C$ is the number of unique labels.</td>
     </tr>
     <tr>
       <td>Entropy</td>
-	  <td>Classification</td>
-	  <td>$\sum_{i=1}^{C} -f_ilog(f_i)$</td><td>$f_i$ is the frequency of label $i$ at a node and $C$ is the number of unique labels.</td>
+      <td>Classification</td>
+      <td>$\sum_{i=1}^{C} -f_ilog(f_i)$</td><td>$f_i$ is the frequency of label $i$ at a node and $C$ is the number of unique labels.</td>
     </tr>
     <tr>
       <td>Variance</td>
-	  <td>Regression</td>
-     <td>$\frac{1}{N} \sum_{i=1}^{N} (y_i - \mu)^2$</td><td>$y_i$ is label for an instance,
-	  $N$ is the number of instances and $\mu$ is the mean given by $\frac{1}{N} \sum_{i=1}^N y_i$.</td>
+      <td>Regression</td>
+      <td>$\frac{1}{N} \sum_{i=1}^{N} (y_i - \mu)^2$</td><td>$y_i$ is label for an instance,
+      $N$ is the number of instances and $\mu$ is the mean given by $\frac{1}{N} \sum_{i=1}^N y_i$.</td>
     </tr>
   </tbody>
 </table>

diff --git a/docs/mllib-ensembles.md b/docs/mllib-ensembles.md
@@ -198,18 +198,18 @@ Notation: $N$ = number of instances. $y_i$ = label of instance $i$.  $x_i$ = fea
   <tbody>
     <tr>
       <td>Log Loss</td>
-	  <td>Classification</td>
-	  <td>$2 \sum_{i=1}^{N} \log(1+\exp(-2 y_i F(x_i)))$</td><td>Twice binomial negative log likelihood.</td>
+      <td>Classification</td>
+      <td>$2 \sum_{i=1}^{N} \log(1+\exp(-2 y_i F(x_i)))$</td><td>Twice binomial negative log likelihood.</td>
     </tr>
     <tr>
       <td>Squared Error</td>
-	  <td>Regression</td>
-	  <td>$\sum_{i=1}^{N} (y_i - F(x_i))^2$</td><td>Also called L2 loss.  Default loss for regression tasks.</td>
+      <td>Regression</td>
+      <td>$\sum_{i=1}^{N} (y_i - F(x_i))^2$</td><td>Also called L2 loss.  Default loss for regression tasks.</td>
     </tr>
     <tr>
       <td>Absolute Error</td>
-	  <td>Regression</td>
-     <td>$\sum_{i=1}^{N} |y_i - F(x_i)|$</td><td>Also called L1 loss.  Can be more robust to outliers than Squared Error.</td>
+      <td>Regression</td>
+      <td>$\sum_{i=1}^{N} |y_i - F(x_i)|$</td><td>Also called L1 loss.  Can be more robust to outliers than Squared Error.</td>
     </tr>
   </tbody>
 </table>

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
@@ -1470,7 +1470,7 @@ See the [configuration page](configuration.html) for information on Spark config
   <td><code>spark.kubernetes.executor.scheduler.name</code></td>
   <td>(none)</td>
   <td>
-	Specify the scheduler name for each executor pod.
+    Specify the scheduler name for each executor pod.
   </td>
   <td>3.0.0</td>
 </tr>

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
@@ -281,8 +281,8 @@ Note, arithmetic operations have special rules to calculate the least common typ
 | Operation  | Result precision                         | Result scale        |
 |------------|------------------------------------------|---------------------|
 | e1 + e2    | max(s1, s2) + max(p1 - s1, p2 - s2) + 1  | max(s1, s2)         |
-| e1 - e2    | max(s1, s2) + max(p1 - s1, p2 - s2) + 1	| max(s1, s2)         |
-| e1 * e2    | p1 + p2 + 1	                        | s1 + s2             |
+| e1 - e2    | max(s1, s2) + max(p1 - s1, p2 - s2) + 1  | max(s1, s2)         |
+| e1 * e2    | p1 + p2 + 1                              | s1 + s2             |
 | e1 / e2    | p1 - s1 + s2 + max(6, s1 + p2 + 1)       | max(6, s1 + p2 + 1) |
 | e1 % e2    | min(p1 - s1, p2 - s2) + max(s1, s2)      | max(s1, s2)         |
 

diff --git a/docs/sql-ref-syntax-qry-select-aggregate.md b/docs/sql-ref-syntax-qry-select-aggregate.md
@@ -94,23 +94,23 @@ SELECT * FROM basic_pays;
 +-----------------+----------+------+
 |    employee_name|department|salary|
 +-----------------+----------+------+
-|      Anthony Bow|Accounting|	6627|
+|      Anthony Bow|Accounting|  6627|
 |      Barry Jones|       SCM| 10586|
-|     Diane Murphy|Accounting|	8435|
-|   Foon Yue Tseng|     Sales|	6660|
+|     Diane Murphy|Accounting|  8435|
+|   Foon Yue Tseng|     Sales|  6660|
 |    George Vanauf|     Sales| 10563|
 |    Gerard Bondur|Accounting| 11472|
-| Gerard Hernandez|       SCM|	6949|
-|    Jeff Firrelli|Accounting|	8992|
-|   Julie Firrelli|     Sales|	9181|
+| Gerard Hernandez|       SCM|  6949|
+|    Jeff Firrelli|Accounting|  8992|
+|   Julie Firrelli|     Sales|  9181|
 |       Larry Bott|       SCM| 11798|
-|  Leslie Jennings|        IT|	8113|
-|  Leslie Thompson|        IT|	5186|
+|  Leslie Jennings|        IT|  8113|
+|  Leslie Thompson|        IT|  5186|
 |      Loui Bondur|       SCM| 10449|
-|   Mary Patterson|Accounting|	9998|
+|   Mary Patterson|Accounting|  9998|
 |  Pamela Castillo|       SCM| 11303|
-|  Steve Patterson|     Sales|	9441|
-|William Patterson|Accounting|	8870|
+|  Steve Patterson|     Sales|  9441|
+|William Patterson|Accounting|  8870|
 +-----------------+----------+------+
 
 SELECT

diff --git a/docs/sql-ref-syntax-qry-select-transform.md b/docs/sql-ref-syntax-qry-select-transform.md
@@ -238,17 +238,17 @@ SELECT TRANSFORM(zip_code, name, age)
     USING 'cat'
 FROM person
 WHERE zip_code > 94500;
-+-------+---------------------+
-|    key|                value|
-+-------+---------------------+
-|  94588|	  Anil K    27|
-|  94588|	  John V    \N|
-|  94511|	Aryan B.    18|
-|  94511|	 David K    42|
-|  94588|	 Zen Hui    50|
-|  94588|	  Dan Li    18|
-|  94511|	Lalit B.    \N|
-+-------+---------------------+
++-------+----------------+
+|    key|           value|
++-------+----------------+
+|  94588|    Anil K    27|
+|  94588|    John V    \N|
+|  94511|  Aryan B.    18|
+|  94511|   David K    42|
+|  94588|   Zen Hui    50|
+|  94588|    Dan Li    18|
+|  94511|  Lalit B.    \N|
++-------+----------------+
 ```
 
 ### Related Statements

diff --git a/docs/streaming-kafka-0-10-integration.md b/docs/streaming-kafka-0-10-integration.md
@@ -26,9 +26,9 @@ there are notable differences in usage.
 ### Linking
 For Scala/Java applications using SBT/Maven project definitions, link your streaming application with the following artifact (see [Linking section](streaming-programming-guide.html#linking) in the main programming guide for further information).
 
-	groupId = org.apache.spark
-	artifactId = spark-streaming-kafka-0-10_{{site.SCALA_BINARY_VERSION}}
-	version = {{site.SPARK_VERSION_SHORT}}
+    groupId = org.apache.spark
+    artifactId = spark-streaming-kafka-0-10_{{site.SCALA_BINARY_VERSION}}
+    version = {{site.SPARK_VERSION_SHORT}}
 
 **Do not** manually add dependencies on `org.apache.kafka` artifacts (e.g. `kafka-clients`).  The `spark-streaming-kafka-0-10` artifact has the appropriate transitive dependencies already, and different versions may be incompatible in hard to diagnose ways.
 

diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
@@ -414,7 +414,7 @@ Similar to Spark, Spark Streaming is available through Maven Central. To write y
 <div class="codetabs">
 <div data-lang="Maven" markdown="1">
 
-	<dependency>
+    <dependency>
         <groupId>org.apache.spark</groupId>
         <artifactId>spark-streaming_{{site.SCALA_BINARY_VERSION}}</artifactId>
         <version>{{site.SPARK_VERSION}}</version>
@@ -423,7 +423,7 @@ Similar to Spark, Spark Streaming is available through Maven Central. To write y
 </div>
 <div data-lang="SBT" markdown="1">
 
-	libraryDependencies += "org.apache.spark" % "spark-streaming_{{site.SCALA_BINARY_VERSION}}" % "{{site.SPARK_VERSION}}" % "provided"
+    libraryDependencies += "org.apache.spark" % "spark-streaming_{{site.SCALA_BINARY_VERSION}}" % "{{site.SPARK_VERSION}}" % "provided"
 </div>
 </div>
 
@@ -2191,7 +2191,7 @@ improve the performance of your application. At a high level, you need to consid
 1. Reducing the processing time of each batch of data by efficiently using cluster resources.
 
 2. Setting the right batch size such that the batches of data can be processed as fast as they
-  	are received (that is, data processing keeps up with the data ingestion).
+   are received (that is, data processing keeps up with the data ingestion).
 
 ## Reducing the Batch Processing Times
 There are a number of optimizations that can be done in Spark to minimize the processing time of

diff --git a/docs/streaming/performance-tips.md b/docs/streaming/performance-tips.md
@@ -43,9 +43,9 @@ val stream = spark.readStream
       .load()
 val query = stream.writeStream
      .format("kafka")
-	.option("topic", "out")
+     .option("topic", "out")
      .option("checkpointLocation", "/tmp/checkpoint")
-	.option("asyncProgressTrackingEnabled", "true")
+     .option("asyncProgressTrackingEnabled", "true")
      .start()
 ```
 

diff --git a/docs/web-ui.md b/docs/web-ui.md
@@ -80,15 +80,15 @@ This page displays the details of a specific job identified by its job ID.
 </p>
 
 * List of stages (grouped by state active, pending, completed, skipped, and failed)
-	* Stage ID
-	* Description of the stage
-	* Submitted timestamp
-	* Duration of the stage
-	* Tasks progress bar
-	* Input: Bytes read from storage in this stage
-	* Output: Bytes written in storage in this stage
-	* Shuffle read: Total shuffle bytes and records read, includes both data read locally and data read from remote executors
-	* Shuffle write: Bytes and records written to disk in order to be read by a shuffle in a future stage
+    * Stage ID
+    * Description of the stage
+    * Submitted timestamp
+    * Duration of the stage
+    * Tasks progress bar
+    * Input: Bytes read from storage in this stage
+    * Output: Bytes written in storage in this stage
+    * Shuffle read: Total shuffle bytes and records read, includes both data read locally and data read from remote executors
+    * Shuffle write: Bytes and records written to disk in order to be read by a shuffle in a future stage
 
 <p style="text-align: center;">
   <img src="img/JobPageDetail3.png" title="DAG" alt="DAG">
@@ -479,12 +479,12 @@ The third section has the SQL statistics of the submitted operations.
 * **Duration time** is the difference between close time and start time.
 * **Statement** is the operation being executed.
 * **State** of the process.
-	* _Started_, first state, when the process begins.
-	* _Compiled_, execution plan generated.
-	* _Failed_, final state when the execution failed or finished with error.
-	* _Canceled_, final state when the execution is canceled.
-	* _Finished_ processing and waiting to fetch results.
-	* _Closed_, final state when client closed the statement.
+    * _Started_, first state, when the process begins.
+    * _Compiled_, execution plan generated.
+    * _Failed_, final state when the execution failed or finished with error.
+    * _Canceled_, final state when the execution is canceled.
+    * _Finished_ processing and waiting to fetch results.
+    * _Closed_, final state when client closed the statement.
 * **Detail** of the execution plan with parsed logical plan, analyzed logical plan, optimized logical plan and physical plan or errors in the SQL statement.
 
 <p style="text-align: center;">