apache · srowen · Feb 26, 2025
diff --git a/Gemfile.lock b/Gemfile.lock
@@ -4,15 +4,15 @@ GEM
     addressable (2.8.7)
       public_suffix (>= 2.0.2, < 7.0)
     colorator (1.1.0)
-    concurrent-ruby (1.3.5)
+    concurrent-ruby (1.3.4)
     em-websocket (0.5.3)
       eventmachine (>= 0.12.9)
       http_parser.rb (~> 0)
     eventmachine (1.2.7)
     ffi (1.17.1)
     forwardable-extended (2.6.0)
     http_parser.rb (0.8.0)
-    i18n (1.14.7)
+    i18n (1.14.6)
       concurrent-ruby (~> 1.0)
     jekyll (4.2.0)
       addressable (~> 2.4)
@@ -48,7 +48,7 @@ GEM
     rb-fsevent (0.11.2)
     rb-inotify (0.11.1)
       ffi (~> 1.0)
-    rexml (3.4.1)
+    rexml (3.4.0)
     rouge (3.26.0)
     safe_yaml (1.0.5)
     sassc (2.4.0)

diff --git a/downloads.md b/downloads.md
@@ -16,34 +16,6 @@ window.onload = function () {
 }
 </script>
 
-## Introduction
-
-Unlike previous Apache Spark™ releases, Spark 4.0 has two distinct distributions: _classic_ and _connect_. As the names suggest, the _classic_ Spark version is the usual distribution you would expect for any new Spark release. The _connect_ distribution, in contrast, is the version with [Spark Connect](https://spark.apache.org/docs/4.0.0-preview2/spark-connect-overview.html) enabled by default. Which one should you download?
-
-Select the _connect_ version if your workloads only use standard [DataFrame](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/dataframe.html) and [Spark SQL](https://spark.apache.org/docs/latest/api/sql/) APIs. Choose the _classic_ version for traditional workloads requiring access to [RDD APIs](https://spark.apache.org/docs/latest/api/python/reference/pyspark.html#rdd-apis), [SparkContext APIs](https://spark.apache.org/docs/latest/api/python/reference/pyspark.html#spark-context-apis), JVM properties, and custom catalyst rules/plans.
-
-If you are not familiar with Spark Connect, the primary benefit is that it provides a stable client API, decoupling the client from the Spark Driver. This makes Spark projects much easier to maintain over time, allowing you to update the Spark Driver and server-side dependencies without having to update the client. To learn more about Spark Connect, and explore its architecture details and benefits, visit here: [Spark Connect architecture](https://spark.apache.org/spark-connect/).
-
-## Selection Matrix for Spark Distributions
-
-This table guides you to which of the two distributions to select based on the type of Spark workloads.
-
-| Workloads Types                                                                                     | Spark Distribution and PySpark Package Mode| Spark Config Change                         |
-|-----------------------------------------------------------------------------------------------------|--------------------------------------------|---------------------------------------------|
-| - Only use standard DataFrame and Spark SQL APIs                                                    | _connect_                                  | None                                        |
-| - Ability to access and debug Spark from IDE or interact in notebooks                               |                                            |                                             |
-| - Use of thin client to access Spark cluster from non-JVM languages                                 |                                            |                                             |
-||||
-| - Access to RDD APIs                                                                                | _classic_                                  | None                                        |
-| - Access to SparkContext API and properties                                                         |                                            |                                             |
-| - Access to standard DataFrame and Spark SQL APIs                                                   |                                            |                                             |
-| - Ability to access and debug Spark from IDE or interact in notebooks                               |                                            |                                             |
-| - Access to JVM properties                                                                          |                                            |                                             |
-| - Access to private catalyst APIs: custom analyzer/optimizer rules, custom query plans              |                                            |                                             |
-||||
-| - Able to switch between classic and connect                                                        | _classic_                                  | `spark.api.mode = {classic or connect}`     |
-||||
-
 ## Download Apache Spark&trade;
 
 1. Choose a Spark release:
@@ -55,39 +27,19 @@ This table guides you to which of the two distributions to select based on the t
 3. Download Spark: <span id="spanDownloadLink"></span>
 
 4. Verify this release using the <span id="sparkDownloadVerify"></span> and [project release KEYS](https://downloads.apache.org/spark/KEYS) by following these [procedures](https://www.apache.org/info/verification.html).
-classic
 
-Note that Spark 4 is pre-built with Scala 2.13 in general, and Spark 3.5+ provides additional pre-built distribution with Scala 2.13.
+Note that Spark 3 is pre-built with Scala 2.12 in general and Spark 3.2+ provides additional pre-built distribution with Scala 2.13.
 
-### Link with Spark ###
+### Link with Spark
 Spark artifacts are [hosted in Maven Central](https://search.maven.org/search?q=g:org.apache.spark). You can add a Maven dependency with the following coordinates:
 
     groupId: org.apache.spark
-    artifactId: spark-core_2.13
-    version: 4.0.0
-
-### Installing with PyPI ###
-Like the two distributions mentioned above, PyPI will also have two PySpark package versions. The default is the _classic_ __pyspark__, while the _connect_ version is __pyspark-connect__ and is dependent on __pyspark__.
-
-Use the decision matrix above to select which PyPI PySpark package to use for your Spark workloads. Both <a href="https://pypi.org/project/pyspark/">PySpark</a> package versions are available on PyPI.
-
-### Installing PySpark Connect ###
-
-Since __pyspark-connect__ package is dependent on __pyspark__, __pyspark-connect__ will automatically install __pyspark__ for you. The __pyspark-connect__ package is mostly empty and merely enables Spark config `spark.api.mode` to _connect_ mode in the underlying pyspark package.
-
-`pip install pyspark-connect==4.0.0`
-
-Thereafter, follow the Spark Connect [quickstart guide](https://spark.apache.org/docs/4.0.0-preview2/api/python/getting_started/quickstart_connect.html) on how to use SparkSession.
-
-### Installing PySpark Classic ###
-
-Simply use `pip install pyspark==4.0.0`
-
-### Installing PySpark Client ###
+    artifactId: spark-core_2.12
+    version: 3.5.4
 
-Alternatively, if you only want a pure Python thin library with Spark Connect capabilities, install _pyspark-client_ package: `pip install pyspark-client`.
+### Installing with PyPi
+<a href="https://pypi.org/project/pyspark/">PySpark</a> is now available in pypi. To install just run `pip install pyspark`.
 
-For more detailed examples of Apache Spark 4.0 features, check the [PySpark User Guide](https://turbo-adventure-1pg35k5.pages.github.io/01-preface.html) and [PySpark installation](https://spark.apache.org/docs/4.0.0-preview2/api/python/getting_started/install.html).
 
 ### Installing with Docker
 
@@ -106,4 +58,4 @@ but they are still available at [Spark release archives](https://archive.apache.
 
 **NOTE**: Previous releases of Spark may be affected by security issues. Please consult the
 [Security](security.html) page for a list of known issues that may affect the version you download
-before deciding to use it.
+before deciding to use it.
diff --git a/site/downloads.html b/site/downloads.html
@@ -161,95 +161,6 @@
 }
 </script>
 
-<h2 id="introduction">Introduction</h2>
-
-<p>Unlike previous Apache Spark™ releases, Spark 4.0 has two distinct distributions: <em>classic</em> and <em>connect</em>. As the names suggest, the <em>classic</em> Spark version is the usual distribution you would expect for any new Spark release. The <em>connect</em> distribution, in contrast, is the version with <a href="https://spark.apache.org/docs/4.0.0-preview2/spark-connect-overview.html">Spark Connect</a> enabled by default. Which one should you download?</p>
-
-<p>Select the <em>connect</em> version if your workloads only use standard <a href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/dataframe.html">DataFrame</a> and <a href="https://spark.apache.org/docs/latest/api/sql/">Spark SQL</a> APIs. Choose the <em>classic</em> version for traditional workloads requiring access to <a href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.html#rdd-apis">RDD APIs</a>, <a href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.html#spark-context-apis">SparkContext APIs</a>, JVM properties, and custom catalyst rules/plans.</p>
-
-<p>If you are not familiar with Spark Connect, the primary benefit is that it provides a stable client API, decoupling the client from the Spark Driver. This makes Spark projects much easier to maintain over time, allowing you to update the Spark Driver and server-side dependencies without having to update the client. To learn more about Spark Connect, and explore its architecture details and benefits, visit here: <a href="https://spark.apache.org/spark-connect/">Spark Connect architecture</a>.</p>
-
-<h2 id="selection-matrix-for-spark-distributions">Selection Matrix for Spark Distributions</h2>
-
-<p>This table guides you to which of the two distributions to select based on the type of Spark workloads.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th>Workloads Types</th>
-      <th>Spark Distribution and PySpark Package Mode</th>
-      <th>Spark Config Change</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td>- Only use standard DataFrame and Spark SQL APIs</td>
-      <td><em>connect</em></td>
-      <td>None</td>
-    </tr>
-    <tr>
-      <td>- Ability to access and debug Spark from IDE or interact in notebooks</td>
-      <td>&#160;</td>
-      <td>&#160;</td>
-    </tr>
-    <tr>
-      <td>- Use of thin client to access Spark cluster from non-JVM languages</td>
-      <td>&#160;</td>
-      <td>&#160;</td>
-    </tr>
-    <tr>
-      <td>&#160;</td>
-      <td>&#160;</td>
-      <td>&#160;</td>
-    </tr>
-    <tr>
-      <td>- Access to RDD APIs</td>
-      <td><em>classic</em></td>
-      <td>None</td>
-    </tr>
-    <tr>
-      <td>- Access to SparkContext API and properties</td>
-      <td>&#160;</td>
-      <td>&#160;</td>
-    </tr>
-    <tr>
-      <td>- Access to standard DataFrame and Spark SQL APIs</td>
-      <td>&#160;</td>
-      <td>&#160;</td>
-    </tr>
-    <tr>
-      <td>- Ability to access and debug Spark from IDE or interact in notebooks</td>
-      <td>&#160;</td>
-      <td>&#160;</td>
-    </tr>
-    <tr>
-      <td>- Access to JVM properties</td>
-      <td>&#160;</td>
-      <td>&#160;</td>
-    </tr>
-    <tr>
-      <td>- Access to private catalyst APIs: custom analyzer/optimizer rules, custom query plans</td>
-      <td>&#160;</td>
-      <td>&#160;</td>
-    </tr>
-    <tr>
-      <td>&#160;</td>
-      <td>&#160;</td>
-      <td>&#160;</td>
-    </tr>
-    <tr>
-      <td>- Able to switch between classic and connect</td>
-      <td><em>classic</em></td>
-      <td><code class="language-plaintext highlighter-rouge">spark.api.mode = {classic or connect}</code></td>
-    </tr>
-    <tr>
-      <td>&#160;</td>
-      <td>&#160;</td>
-      <td>&#160;</td>
-    </tr>
-  </tbody>
-</table>
-
 <h2 id="download-apache-spark">Download Apache Spark&#8482;</h2>
 
 <ol>
@@ -265,43 +176,22 @@ <h2 id="download-apache-spark">Download Apache Spark&#8482;</h2>
     <p>Download Spark: <span id="spanDownloadLink"></span></p>
   </li>
   <li>
-    <p>Verify this release using the <span id="sparkDownloadVerify"></span> and <a href="https://downloads.apache.org/spark/KEYS">project release KEYS</a> by following these <a href="https://www.apache.org/info/verification.html">procedures</a>.
-classic</p>
+    <p>Verify this release using the <span id="sparkDownloadVerify"></span> and <a href="https://downloads.apache.org/spark/KEYS">project release KEYS</a> by following these <a href="https://www.apache.org/info/verification.html">procedures</a>.</p>
   </li>
 </ol>
 
-<p>Note that Spark 4 is pre-built with Scala 2.13 in general, and Spark 3.5+ provides additional pre-built distribution with Scala 2.13.</p>
+<p>Note that Spark 3 is pre-built with Scala 2.12 in general and Spark 3.2+ provides additional pre-built distribution with Scala 2.13.</p>
 
 <h3 id="link-with-spark">Link with Spark</h3>
 <p>Spark artifacts are <a href="https://search.maven.org/search?q=g:org.apache.spark">hosted in Maven Central</a>. You can add a Maven dependency with the following coordinates:</p>
 
 <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>groupId: org.apache.spark
-artifactId: spark-core_2.13
-version: 4.0.0
+artifactId: spark-core_2.12
+version: 3.5.4
 </code></pre></div></div>
 
-<h3 id="installing-with-pypi">Installing with PyPI</h3>
-<p>Like the two distributions mentioned above, PyPI will also have two PySpark package versions. The default is the <em>classic</em> <strong>pyspark</strong>, while the <em>connect</em> version is <strong>pyspark-connect</strong> and is dependent on <strong>pyspark</strong>.</p>
-
-<p>Use the decision matrix above to select which PyPI PySpark package to use for your Spark workloads. Both <a href="https://pypi.org/project/pyspark/">PySpark</a> package versions are available on PyPI.</p>
-
-<h3 id="installing-pyspark-connect">Installing PySpark Connect</h3>
-
-<p>Since <strong>pyspark-connect</strong> package is dependent on <strong>pyspark</strong>, <strong>pyspark-connect</strong> will automatically install <strong>pyspark</strong> for you. The <strong>pyspark-connect</strong> package is mostly empty and merely enables Spark config <code class="language-plaintext highlighter-rouge">spark.api.mode</code> to <em>connect</em> mode in the underlying pyspark package.</p>
-
-<p><code class="language-plaintext highlighter-rouge">pip install pyspark-connect==4.0.0</code></p>
-
-<p>Thereafter, follow the Spark Connect <a href="https://spark.apache.org/docs/4.0.0-preview2/api/python/getting_started/quickstart_connect.html">quickstart guide</a> on how to use SparkSession.</p>
-
-<h3 id="installing-pyspark-classic">Installing PySpark Classic</h3>
-
-<p>Simply use <code class="language-plaintext highlighter-rouge">pip install pyspark==4.0.0</code></p>
-
-<h3 id="installing-pyspark-client">Installing PySpark Client</h3>
-
-<p>Alternatively, if you only want a pure Python thin library with Spark Connect capabilities, install <em>pyspark-client</em> package: <code class="language-plaintext highlighter-rouge">pip install pyspark-client</code>.</p>
-
-<p>For more detailed examples of Apache Spark 4.0 features, check the <a href="https://turbo-adventure-1pg35k5.pages.github.io/01-preface.html">PySpark User Guide</a> and <a href="https://spark.apache.org/docs/4.0.0-preview2/api/python/getting_started/install.html">PySpark installation</a>.</p>
+<h3 id="installing-with-pypi">Installing with PyPi</h3>
+<p><a href="https://pypi.org/project/pyspark/">PySpark</a> is now available in pypi. To install just run <code class="language-plaintext highlighter-rouge">pip install pyspark</code>.</p>
 
 <h3 id="installing-with-docker">Installing with Docker</h3>