diff --git a/docs/en/advanced_settings.md b/docs/en/advanced_settings.md index 84c8dc5751187e..859ca6d7fae55a 100644 --- a/docs/en/advanced_settings.md +++ b/docs/en/advanced_settings.md @@ -17,6 +17,7 @@ sidebar: You can change the following Spark NLP configurations via Spark Configuration: +{:.table-model-big} | Property Name | Default | Meaning | |---------------------------------------------------------|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `spark.jsl.settings.pretrained.cache_folder` | `~/cache_pretrained` | The location to download and extract pretrained `Models` and `Pipelines`. By default, it will be in User's Home directory under `cache_pretrained` directory | @@ -32,6 +33,8 @@ You can change the following Spark NLP configurations via Spark Configuration: | `spark.jsl.settings.onnx.optimizationLevel` | `ALL_OPT` | Sets the optimization level of this options object, overriding the old setting. | | `spark.jsl.settings.onnx.executionMode` | `SEQUENTIAL` | Sets the execution mode of this options object, overriding the old setting. | +
+ ### How to set Spark NLP Configuration **SparkSession:** @@ -93,6 +96,7 @@ spark.jsl.settings.annotator.log_folder dbfs:/PATH_TO_LOGS NOTE: If this is an existing cluster, after adding new configs or changing existing properties you need to restart it. +
### S3 Integration diff --git a/docs/en/hardware_acceleration.md b/docs/en/hardware_acceleration.md index ca87d75debc680..ae32a43ffa1846 100644 --- a/docs/en/hardware_acceleration.md +++ b/docs/en/hardware_acceleration.md @@ -34,6 +34,7 @@ Since the new Transformer models such as BERT for Word and Sentence embeddings a ![Spark NLP CPU vs. GPU](/assets/images/Spark_NLP_CPU_vs._GPU_Transformers_(Word_Embeddings).png) +{:.table-model-big} | Model on GPU | Spark NLP 3.4.3 vs. 4.0.0 | | ----------------- |:-------------------------:| | RoBERTa base | +560%(6.6x) | @@ -72,6 +73,7 @@ Here we compare the last release of Spark NLP 3.4.3 on CPU (normal) with Spark N ![Spark NLP 3.4.4 CPU vs. Spark NLP 4.0 CPU with oneDNN](/assets/images/Spark_NLP_3.4_on_CPU_vs._Spark_NLP_4.0_on_CPU_with_oneDNN.png) +{:.table-model-big} | Model on CPU | 3.4.x vs. 4.0.0 with oneDNN | | ----------------- |:------------------------:| | BERT Base | +47% | diff --git a/docs/en/install.md b/docs/en/install.md index eb4cb54d728905..728a525f0118be 100644 --- a/docs/en/install.md +++ b/docs/en/install.md @@ -106,6 +106,8 @@ spark = SparkSession.builder \ If using local jars, you can use `spark.jars` instead for comma-delimited jar files. For cluster setups, of course, you'll have to put the jars in a reachable location for all driver and executor nodes. +
+ ### Python without explicit Pyspark installation ### Pip/Conda @@ -306,7 +308,6 @@ as expected.5.4.1
- ## Command line Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x, Apache Spark 3.2.x, Apache Spark 3.3.x, Apache Spark 3.4.x, and Apache Spark 3.5.x @@ -379,6 +380,8 @@ spark-shell \ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0 ``` +
+ ## Installation for M1 & M2 Chips ### Scala and Java for M1 @@ -524,6 +527,8 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0 - Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is available to driver path +
+ ## Python in Zeppelin Apart from the previous step, install the python module through pip @@ -546,6 +551,8 @@ install the pip library with (e.g. `python3`). An alternative option would be to set `SPARK_SUBMIT_OPTIONS` (zeppelin-env.sh) and make sure `--packages` is there as shown earlier since it includes both scala and python side installation. +
+ ## Jupyter Notebook 5.4.1 **Recommended:** @@ -582,6 +589,8 @@ Alternatively, you can mix in using `--jars` option for pyspark + `pip install s If not using pyspark at all, you'll have to run the instructions pointed [here](#python-without-explicit-pyspark-installation) +
+ ## Databricks Cluster 1. Create a cluster if you don't have one already @@ -605,6 +614,8 @@ NOTE: Databricks' runtimes support different Apache Spark major releases. Please NLP Maven package name (Maven Coordinate) for your runtime from our [Packages Cheatsheet](https://github.com/JohnSnowLabs/spark-nlp#packages-cheatsheet) +
+ ## EMR Cluster To launch EMR clusters with Apache Spark/PySpark and Spark NLP correctly you need to have bootstrap and software @@ -670,6 +681,8 @@ aws emr create-cluster \ --profile ``` +
+ ## GCP Dataproc 1. Create a cluster if you don't have one already as follows. @@ -733,6 +746,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \ Spark NLP *5.4.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x +{:.table-model-big} | Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x | |-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------| | 5.4.x | YES | YES | YES | YES | YES | YES | NO | NO | @@ -750,6 +764,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github ## Scala and Python Support +{:.table-model-big} | Spark NLP | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10| Scala 2.11 | Scala 2.12 | |-----------|------------|------------|------------|------------|------------|------------|------------| | 5.3.x | NO | YES | YES | YES | YES | NO | YES | @@ -1260,6 +1275,7 @@ PipelineModel.load("/tmp/explain_document_dl_en_2.0.2_2.4_1556530585689/") - Since you are downloading and loading models/pipelines manually, this means Spark NLP is not downloading the most recent and compatible models/pipelines for you. Choosing the right model/pipeline is on you - If you are local, you can load the model/pipeline from your local FileSystem, however, if you are in a cluster setup you need to put the model/pipeline on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/explain_document_dl_en_2.0.2_2.4_1556530585689/`) +
## Compiled JARs @@ -1285,6 +1301,8 @@ sbt -Dis_gpu=true assembly sbt -Dis_silicon=true assembly ``` +
+ ### Using the jar manually If for some reason you need to use the JAR, you can either download the Fat JARs provided here or download it diff --git a/docs/en/mlflow.md b/docs/en/mlflow.md index a1c68ef64dfa7b..7f2ceabe6f3f20 100644 --- a/docs/en/mlflow.md +++ b/docs/en/mlflow.md @@ -133,6 +133,8 @@ import pandas as pd import glob ``` +
+ ### Spark NLP imports ``` import sparknlp @@ -172,6 +174,8 @@ We will be showcasing the serialization and experiment tracking of `NERDLApproac There is one specific util that is able to parse the log of that approach in order to extract the metrics and charts. Let's get it. +
+ ### Ner Log Parser Util `!wget -q https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Public/utils/ner_image_log_parser.py` @@ -179,6 +183,8 @@ Now, let's import the library: `import ner_image_log_parser` +
+ ### Starting a SparkNLP session It's important we create a Spark NLP Session using the Session Builder, since we need to specify the jars not only of Spark NLP, but also of MLFlow. @@ -198,6 +204,8 @@ def start(): spark = start() ``` +
+ ### Training dataset preparation Let's download some training and test datasets: ``` @@ -221,6 +229,8 @@ TRAINING_SIZE = training_data.count() TRAINING_SIZE ``` +
+ ### Hyperparameters configuration Let's configure our hyperparameter values. ``` @@ -236,6 +246,8 @@ RANDOM_SEED = 0 # Adapt me to your experiment VALIDATION_SPLIT = 0.1 # Adapt me to your experiment ``` +
+ ### Creating the experiment Now, we are ready to instantiate an experiment in MLFlow ``` @@ -244,6 +256,8 @@ EXPERIMENT_ID = mlflow.create_experiment(f"{MODEL_NAME}_{EXPERIMENT_NAME}") Each time you want to test a different thing, change the EXPERIMENT_NAME and rerun the line above to create a new entry in the experiment. By changing the experiment name, a new experiment ID will be generated. Each experiment ID groups all runs in separates folder inside `./mlruns`. +
+ ### Pipeline creation ``` document = DocumentAssembler()\ @@ -300,11 +314,15 @@ ner_training_pipeline = Pipeline(stages = ner_preprocessing_pipeline.getStages() ## Preparing inference objects Now, let's prepare the inference as well, since we will train and infer afterwards, and store all the results of training and inference as artifacts in our MLFlow object. +
+ ### Test dataset preparation ``` test_data = CoNLL().readDataset(spark, TEST_DATASET) ``` +
+ ### Setting the names of the inference objects ``` INFERENCE_NAME = "inference.parquet" # This is the name of the results inference on the test dataset, serialized in parquet, @@ -520,11 +538,11 @@ Now, we just need to launch the MLFLow UI to see:
## Some example screenshots -![](/assets/images/mlflow/mlflow10.png) -![](/assets/images/mlflow/mlflow11.png) -![](/assets/images/mlflow/mlflow12.png) -![](/assets/images/mlflow/mlflow13.png) -![](/assets/images/mlflow/mlflow14.png) -![](/assets/images/mlflow/mlflow15.png) +![MLFLow](/assets/images/mlflow/mlflow10.png) +![MLFLow](/assets/images/mlflow/mlflow11.png) +![MLFLow](/assets/images/mlflow/mlflow12.png) +![MLFLow](/assets/images/mlflow/mlflow13.png) +![MLFLow](/assets/images/mlflow/mlflow14.png) +![MLFLow](/assets/images/mlflow/mlflow15.png)
\ No newline at end of file diff --git a/docs/en/pipelines.md b/docs/en/pipelines.md index 0204f8c62b88f9..0300e381d3de29 100644 --- a/docs/en/pipelines.md +++ b/docs/en/pipelines.md @@ -62,6 +62,8 @@ annotation.select("entities.result").show(false) */ ``` +
+ #### Showing Available Pipelines There are functions in Spark NLP that will list all the available Pipelines @@ -105,6 +107,8 @@ ResourceDownloader.showPublicPipelines(lang = "en", version = "3.1.0") */ ``` +
+ #### Please check out our Models Hub for the full list of [pre-trained pipelines](https://sparknlp.org/models) with examples, demos, benchmarks, and more ### Models @@ -138,6 +142,8 @@ val french_pos = PerceptronModel.load("/tmp/pos_ud_gsd_fr_2.0.2_2.4_155653145734 .setOutputCol("pos") ``` +
+ #### Showing Available Models There are functions in Spark NLP that will list all the available Models diff --git a/docs/en/training.md b/docs/en/training.md index 497d5025e3fe41..319ec377186744 100644 --- a/docs/en/training.md +++ b/docs/en/training.md @@ -138,7 +138,7 @@ All of these graphs use an LSTM of size 128 and number of chars 100 In case, your train dataset has a different number of tags, embeddings dimension, number of chars and LSTM size combinations shown in the table above, `NerDLApproach` will raise an **IllegalArgumentException** exception during runtime with the message below: -*Graph [parameter] should be [value]: Could not find a suitable tensorflow graph for embeddings dim: [value] tags: [value] nChars: [value]. Check https://sparknlp.org/docs/en/graph for instructions to generate the required graph.* +*Graph [parameter] should be [value]: Could not find a suitable tensorflow graph for embeddings dim: [value] tags: [value] nChars: [value]. Check [https://sparknlp.org/docs/en/graph](https://sparknlp.org/docs/en/graph) for instructions to generate the required graph.* To overcome this exception message we have to follow these steps: diff --git a/docs/en/transformer_entries/AlbertEmbeddings.md b/docs/en/transformer_entries/AlbertEmbeddings.md index 2dba88289c8839..f2be9b08d6ec61 100644 --- a/docs/en/transformer_entries/AlbertEmbeddings.md +++ b/docs/en/transformer_entries/AlbertEmbeddings.md @@ -10,6 +10,7 @@ All official Albert releases by google in TF-HUB are supported with this Albert **Ported TF-Hub Models:** +{:.table-model-big} | Spark NLP Model | TF-Hub Model | Model Properties | | -------------------------- | ----------------------------------------------------------- | ------------------------------------------------------ | | `"albert_base_uncased"` | [albert_base](https://tfhub.dev/google/albert_base/3) | 768-embed-dim, 12-layer, 12-heads, 12M parameters | @@ -39,9 +40,9 @@ and the [AlbertEmbeddingsTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blo [ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS](https://arxiv.org/pdf/1909.11942.pdf) -https://github.com/google-research/ALBERT +[https://github.com/google-research/ALBERT](https://github.com/google-research/ALBERT) -https://tfhub.dev/s?q=albert +[https://tfhub.dev/s?q=albert](https://tfhub.dev/s?q=albert) **Paper abstract:** diff --git a/docs/en/transformer_entries/AlbertForQuestionAnswering.md b/docs/en/transformer_entries/AlbertForQuestionAnswering.md index f4838bc897849c..320f5de84b80fc 100644 --- a/docs/en/transformer_entries/AlbertForQuestionAnswering.md +++ b/docs/en/transformer_entries/AlbertForQuestionAnswering.md @@ -19,7 +19,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Question+Answering). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [AlbertForQuestionAnsweringTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/AlbertForQuestionAnsweringTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/AlbertForSequenceClassification.md b/docs/en/transformer_entries/AlbertForSequenceClassification.md index 105d2d45b3390d..7e3c4722f78ad7 100644 --- a/docs/en/transformer_entries/AlbertForSequenceClassification.md +++ b/docs/en/transformer_entries/AlbertForSequenceClassification.md @@ -19,7 +19,7 @@ The default model is `"albert_base_sequence_classifier_imdb"`, if no name is pro For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are -compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +compatible and how to import them see [https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [AlbertForSequenceClassification](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/AlbertForSequenceClassificationTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/BartTransformer.md b/docs/en/transformer_entries/BartTransformer.md index c101b44c8ed247..1de8fba65b4b64 100644 --- a/docs/en/transformer_entries/BartTransformer.md +++ b/docs/en/transformer_entries/BartTransformer.md @@ -43,7 +43,7 @@ For extended examples of usage, see **References:** - [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://aclanthology.org/2020.acl-main.703.pdf) -- https://github.com/pytorch/fairseq +- [https://github.com/pytorch/fairseq](https://github.com/pytorch/fairseq) **Paper Abstract:** diff --git a/docs/en/transformer_entries/BertEmbeddings.md b/docs/en/transformer_entries/BertEmbeddings.md index fa1ae450dab5e1..2b866bfe0b11a7 100644 --- a/docs/en/transformer_entries/BertEmbeddings.md +++ b/docs/en/transformer_entries/BertEmbeddings.md @@ -23,7 +23,7 @@ and the [BertEmbeddingsTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/ [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) -https://github.com/google-research/bert +[https://github.com/google-research/bert](https://github.com/google-research/bert) **Paper abstract** diff --git a/docs/en/transformer_entries/BertForQuestionAnswering.md b/docs/en/transformer_entries/BertForQuestionAnswering.md index 23f69907cb9726..0683f9794f01eb 100644 --- a/docs/en/transformer_entries/BertForQuestionAnswering.md +++ b/docs/en/transformer_entries/BertForQuestionAnswering.md @@ -19,7 +19,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Question+Answering). Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [BertForQuestionAnsweringTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/BertForQuestionAnsweringTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/BertForSequenceClassification.md b/docs/en/transformer_entries/BertForSequenceClassification.md index 196cd2749f5749..34a985a0ee8d89 100644 --- a/docs/en/transformer_entries/BertForSequenceClassification.md +++ b/docs/en/transformer_entries/BertForSequenceClassification.md @@ -18,7 +18,7 @@ The default model is `"bert_base_sequence_classifier_imdb"`, if no name is provi For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are -compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +compatible and how to import them see [https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [BertForSequenceClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/BertForSequenceClassificationTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/BertForZeroShotClassification.md b/docs/en/transformer_entries/BertForZeroShotClassification.md index 0adace9e353674..4d642365325583 100644 --- a/docs/en/transformer_entries/BertForZeroShotClassification.md +++ b/docs/en/transformer_entries/BertForZeroShotClassification.md @@ -28,7 +28,7 @@ For available pretrained models please see the [Models Hub](https://nlp.johnsnowlabs.com/models?task=Text+Classification). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [BertForZeroShotClassification](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/main/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/BertForZeroShotClassification.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/CLIPForZeroShotClassification.md b/docs/en/transformer_entries/CLIPForZeroShotClassification.md index 99e62617c69e88..e4a42ba0c0f5ae 100644 --- a/docs/en/transformer_entries/CLIPForZeroShotClassification.md +++ b/docs/en/transformer_entries/CLIPForZeroShotClassification.md @@ -25,7 +25,7 @@ For available pretrained models please see the Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [CLIPForZeroShotClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/cv/CLIPForZeroShotClassificationTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/CamemBertEmbeddings.md b/docs/en/transformer_entries/CamemBertEmbeddings.md index dd059c12ab6b5b..add2ee047f697f 100644 --- a/docs/en/transformer_entries/CamemBertEmbeddings.md +++ b/docs/en/transformer_entries/CamemBertEmbeddings.md @@ -24,13 +24,13 @@ For extended examples of usage, see the and the [CamemBertEmbeddingsTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/CamemBertEmbeddingsTestSpec.scala). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). **Sources** : [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) -https://huggingface.co/camembert +[https://huggingface.co/camembert](https://huggingface.co/camembert) **Paper abstract** diff --git a/docs/en/transformer_entries/CamemBertForQuestionAnswering.md b/docs/en/transformer_entries/CamemBertForQuestionAnswering.md index 61347240c6c1be..c49dc422e01a50 100644 --- a/docs/en/transformer_entries/CamemBertForQuestionAnswering.md +++ b/docs/en/transformer_entries/CamemBertForQuestionAnswering.md @@ -21,7 +21,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Question+Answering). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [CamemBertForQuestionAnsweringTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/CamemBertForQuestionAnsweringTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/CamemBertForSequenceClassification.md b/docs/en/transformer_entries/CamemBertForSequenceClassification.md index bad2035f5bf7db..bb78722d7c38fd 100644 --- a/docs/en/transformer_entries/CamemBertForSequenceClassification.md +++ b/docs/en/transformer_entries/CamemBertForSequenceClassification.md @@ -22,7 +22,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [CamemBertForSequenceClassification](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/CamemBertForSequenceClassificationTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/CamemBertForTokenClassification.md b/docs/en/transformer_entries/CamemBertForTokenClassification.md index 7d8b446433314e..7159c4542f96e1 100644 --- a/docs/en/transformer_entries/CamemBertForTokenClassification.md +++ b/docs/en/transformer_entries/CamemBertForTokenClassification.md @@ -21,7 +21,7 @@ For available pretrained models please see the and the [CamemBertForTokenClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/CamemBertForTokenClassificationTestSpec.scala). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). {%- endcapture -%} {%- capture input_anno -%} diff --git a/docs/en/transformer_entries/ConvNextForImageClassification.md b/docs/en/transformer_entries/ConvNextForImageClassification.md index 9e664a7bb4ca42..7a5eb8dd186973 100644 --- a/docs/en/transformer_entries/ConvNextForImageClassification.md +++ b/docs/en/transformer_entries/ConvNextForImageClassification.md @@ -25,7 +25,7 @@ For available pretrained models please see the Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [ConvNextForImageClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/cv/ConvNextForImageClassificationTestSpec.scala). diff --git a/docs/en/transformer_entries/DeBertaEmbeddings.md b/docs/en/transformer_entries/DeBertaEmbeddings.md index 56302333caf17f..7816d94e701a2c 100644 --- a/docs/en/transformer_entries/DeBertaEmbeddings.md +++ b/docs/en/transformer_entries/DeBertaEmbeddings.md @@ -17,19 +17,19 @@ The default model is `"deberta_v3_base"`, if no name is provided. For extended examples see [DeBertaEmbeddingsTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/DeBertaEmbeddingsTestSpec.scala). Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are -compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +compatible and how to import them see [https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in RoBERTa. **Sources:** -https://github.com/microsoft/DeBERTa +[https://github.com/microsoft/DeBERTa](https://github.com/microsoft/DeBERTa) -https://www.microsoft.com/en-us/research/blog/microsoft-deberta-surpasses-human-performance-on-the-superglue-benchmark/ +[https://www.microsoft.com/en-us/research/blog/microsoft-deberta-surpasses-human-performance-on-the-superglue-benchmark/](https://www.microsoft.com/en-us/research/blog/microsoft-deberta-surpasses-human-performance-on-the-superglue-benchmark/) **Paper abstract:** -*Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangled matrices on their contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9% (90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.* +*Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangled matrices on their contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9% (90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and pre-trained models will be made publicly available at [https://github.com/microsoft/DeBERTa](https://github.com/microsoft/DeBERTa).* {%- endcapture -%} {%- capture input_anno -%} diff --git a/docs/en/transformer_entries/DeBertaForQuestionAnswering.md b/docs/en/transformer_entries/DeBertaForQuestionAnswering.md index b6ac208d427206..689e63f4b0f1f2 100644 --- a/docs/en/transformer_entries/DeBertaForQuestionAnswering.md +++ b/docs/en/transformer_entries/DeBertaForQuestionAnswering.md @@ -19,7 +19,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Question+Answering). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [DeBertaForQuestionAnsweringTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/DeBertaForQuestionAnsweringTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/DeBertaForSequenceClassification.md b/docs/en/transformer_entries/DeBertaForSequenceClassification.md index 12a5fc145af72d..b45c5abd9c6d36 100644 --- a/docs/en/transformer_entries/DeBertaForSequenceClassification.md +++ b/docs/en/transformer_entries/DeBertaForSequenceClassification.md @@ -21,7 +21,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [DeBertaForSequenceClassification](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/DeBertaForSequenceClassificationTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/DeBertaForTokenClassification.md b/docs/en/transformer_entries/DeBertaForTokenClassification.md index 9b08d5d7f3db73..7e3c21f2d2132e 100644 --- a/docs/en/transformer_entries/DeBertaForTokenClassification.md +++ b/docs/en/transformer_entries/DeBertaForTokenClassification.md @@ -24,7 +24,7 @@ and the [DeBertaForTokenClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/DeBertaForTokenClassificationTestSpec.scala). Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). {%- endcapture -%} {%- capture input_anno -%} diff --git a/docs/en/transformer_entries/DistilBertForQuestionAnswering.md b/docs/en/transformer_entries/DistilBertForQuestionAnswering.md index ee2e0a2a54fdd5..3c6f2902944ac2 100644 --- a/docs/en/transformer_entries/DistilBertForQuestionAnswering.md +++ b/docs/en/transformer_entries/DistilBertForQuestionAnswering.md @@ -19,7 +19,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Question+Answering). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [DistilBertForSequenceClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/DistilBertForSequenceClassificationTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/DistilBertForSequenceClassification.md b/docs/en/transformer_entries/DistilBertForSequenceClassification.md index 2fa550d0b94099..6ee97629927714 100644 --- a/docs/en/transformer_entries/DistilBertForSequenceClassification.md +++ b/docs/en/transformer_entries/DistilBertForSequenceClassification.md @@ -19,7 +19,7 @@ The default model is `"distilbert_base_sequence_classifier_imdb"`, if no name is For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are -compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +compatible and how to import them see [https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [DistilBertForSequenceClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/DistilBertForSequenceClassificationTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/DistilBertForZeroShotClassification.md b/docs/en/transformer_entries/DistilBertForZeroShotClassification.md index 296173299837a5..4d918a3e754f5f 100644 --- a/docs/en/transformer_entries/DistilBertForZeroShotClassification.md +++ b/docs/en/transformer_entries/DistilBertForZeroShotClassification.md @@ -29,7 +29,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). {%- endcapture -%} {%- capture input_anno -%} diff --git a/docs/en/transformer_entries/ElmoEmbeddings.md b/docs/en/transformer_entries/ElmoEmbeddings.md index c4da92c9811bf5..50a7c3949aede6 100644 --- a/docs/en/transformer_entries/ElmoEmbeddings.md +++ b/docs/en/transformer_entries/ElmoEmbeddings.md @@ -35,7 +35,7 @@ and the [ElmoEmbeddingsTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/ **Sources:** -https://tfhub.dev/google/elmo/3 +[https://tfhub.dev/google/elmo/3](https://tfhub.dev/google/elmo/3) [Deep contextualized word representations](https://arxiv.org/abs/1802.05365) diff --git a/docs/en/transformer_entries/GPT2Transformer.md b/docs/en/transformer_entries/GPT2Transformer.md index 90e5d9af71020b..41b7afa053ecf1 100644 --- a/docs/en/transformer_entries/GPT2Transformer.md +++ b/docs/en/transformer_entries/GPT2Transformer.md @@ -32,7 +32,7 @@ For extended examples of usage, see [GPT2TestSpec](https://github.com/JohnSnowLa **Sources:** - [Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) - - https://github.com/openai/gpt-2 + - [https://github.com/openai/gpt-2](https://github.com/openai/gpt-2) **Paper Abstract:** diff --git a/docs/en/transformer_entries/HubertForCTC.md b/docs/en/transformer_entries/HubertForCTC.md index 9fa0f65f5a7a9f..8afec2ef701752 100644 --- a/docs/en/transformer_entries/HubertForCTC.md +++ b/docs/en/transformer_entries/HubertForCTC.md @@ -26,7 +26,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [HubertForCTCTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/audio/HubertForCTCTest.scala). diff --git a/docs/en/transformer_entries/InstructorEmbeddings.md b/docs/en/transformer_entries/InstructorEmbeddings.md index 5be60501a4b29c..b02cfe1c5f42dd 100644 --- a/docs/en/transformer_entries/InstructorEmbeddings.md +++ b/docs/en/transformer_entries/InstructorEmbeddings.md @@ -47,7 +47,7 @@ the previous best model, achieves state-of-the-art performance, with an average 3.4% compared to the previous best results on the 70 diverse datasets. Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets. Our model, code, and -data are available at this https URL. https://instructor-embedding.github.io/* +data are available at this https URL. [https://instructor-embedding.github.io/](https://instructor-embedding.github.io/)* {%- endcapture -%} {%- capture input_anno -%} diff --git a/docs/en/transformer_entries/LongformerEmbeddings.md b/docs/en/transformer_entries/LongformerEmbeddings.md index 1e0043dce029b6..7c170cd141043e 100644 --- a/docs/en/transformer_entries/LongformerEmbeddings.md +++ b/docs/en/transformer_entries/LongformerEmbeddings.md @@ -29,7 +29,7 @@ In contrast to most prior work, we also pretrain Longformer and finetune it on a Our pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on WikiHop and TriviaQA. We finally introduce the Longformer-Encoder-Decoder (LED), a Longformer variant for supporting long document generative sequence-to-sequence tasks, and demonstrate its effectiveness on the arXiv summarization dataset.* -The original code can be found ```here``` https://github.com/allenai/longformer. +The original code can be found ```here``` [https://github.com/allenai/longformer](https://github.com/allenai/longformer). {%- endcapture -%} {%- capture input_anno -%} diff --git a/docs/en/transformer_entries/LongformerForQuestionAnswering.md b/docs/en/transformer_entries/LongformerForQuestionAnswering.md index 69ac0299360a4d..24b328c0812860 100644 --- a/docs/en/transformer_entries/LongformerForQuestionAnswering.md +++ b/docs/en/transformer_entries/LongformerForQuestionAnswering.md @@ -19,7 +19,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Question+Answering). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [LongformerForQuestionAnsweringTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/LongformerForQuestionAnsweringTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/LongformerForSequenceClassification.md b/docs/en/transformer_entries/LongformerForSequenceClassification.md index bc7c5d3f1c9efd..de86cac5013d02 100644 --- a/docs/en/transformer_entries/LongformerForSequenceClassification.md +++ b/docs/en/transformer_entries/LongformerForSequenceClassification.md @@ -19,7 +19,7 @@ The default model is `"longformer_base_sequence_classifier_imdb"`, if no name is For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are -compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +compatible and how to import them see [https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [LongformerForSequenceClassification](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/LongformerForSequenceClassificationTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/MPNetForQuestionAnswering.md b/docs/en/transformer_entries/MPNetForQuestionAnswering.md index 369e1078c21a41..2c0368ebfdf8aa 100644 --- a/docs/en/transformer_entries/MPNetForQuestionAnswering.md +++ b/docs/en/transformer_entries/MPNetForQuestionAnswering.md @@ -21,7 +21,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Question+Answering). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [MPNetForQuestionAnsweringTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/MPNetForQuestionAnsweringTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/MPNetForSequenceClassification.md b/docs/en/transformer_entries/MPNetForSequenceClassification.md index 947f7ce1c40d82..43dcad0b88641d 100644 --- a/docs/en/transformer_entries/MPNetForSequenceClassification.md +++ b/docs/en/transformer_entries/MPNetForSequenceClassification.md @@ -23,7 +23,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [MPNetForSequenceClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/MPNetForSequenceClassificationTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/RoBertaEmbeddings.md b/docs/en/transformer_entries/RoBertaEmbeddings.md index fc92f5d5d1dba2..3040f6602955e2 100644 --- a/docs/en/transformer_entries/RoBertaEmbeddings.md +++ b/docs/en/transformer_entries/RoBertaEmbeddings.md @@ -37,7 +37,7 @@ Tips: - RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a different pretraining scheme. - RoBERTa doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:``) -The original code can be found ```here``` https://github.com/pytorch/fairseq/tree/master/examples/roberta. +The original code can be found ```here``` [https://github.com/pytorch/fairseq/tree/master/examples/roberta](https://github.com/pytorch/fairseq/tree/master/examples/roberta). {%- endcapture -%} {%- capture input_anno -%} diff --git a/docs/en/transformer_entries/RoBertaForQuestionAnswering.md b/docs/en/transformer_entries/RoBertaForQuestionAnswering.md index 89fabab31b7448..438e7d51027579 100644 --- a/docs/en/transformer_entries/RoBertaForQuestionAnswering.md +++ b/docs/en/transformer_entries/RoBertaForQuestionAnswering.md @@ -19,7 +19,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Question+Answering). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [RoBertaForQuestionAnsweringTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/RoBertaForQuestionAnsweringTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/RoBertaForSequenceClassification.md b/docs/en/transformer_entries/RoBertaForSequenceClassification.md index e28390e177e60f..33948f67f58cfb 100644 --- a/docs/en/transformer_entries/RoBertaForSequenceClassification.md +++ b/docs/en/transformer_entries/RoBertaForSequenceClassification.md @@ -19,7 +19,7 @@ The default model is `"roberta_base_sequence_classifier_imdb"`, if no name is pr For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are -compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +compatible and how to import them see [https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [RoBertaForSequenceClassification](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/RoBertaForSequenceClassificationTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/RoBertaForZeroShotClassification.md b/docs/en/transformer_entries/RoBertaForZeroShotClassification.md index 07b7bb3c170248..fb81742ceaa007 100644 --- a/docs/en/transformer_entries/RoBertaForZeroShotClassification.md +++ b/docs/en/transformer_entries/RoBertaForZeroShotClassification.md @@ -28,7 +28,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). {%- endcapture -%} diff --git a/docs/en/transformer_entries/RoBertaSentenceEmbeddings.md b/docs/en/transformer_entries/RoBertaSentenceEmbeddings.md index 5983b6be14f319..e83bd3d2a3ffbc 100644 --- a/docs/en/transformer_entries/RoBertaSentenceEmbeddings.md +++ b/docs/en/transformer_entries/RoBertaSentenceEmbeddings.md @@ -35,7 +35,7 @@ Tips: - RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a different pretraining scheme. - RoBERTa doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:``) -The original code can be found ```here``` https://github.com/pytorch/fairseq/tree/master/examples/roberta. +The original code can be found ```here``` [https://github.com/pytorch/fairseq/tree/master/examples/roberta](https://github.com/pytorch/fairseq/tree/master/examples/roberta). {%- endcapture -%} {%- capture input_anno -%} diff --git a/docs/en/transformer_entries/SpanBertCoref.md b/docs/en/transformer_entries/SpanBertCoref.md index 888e444cca2ce6..a94094097a1c24 100644 --- a/docs/en/transformer_entries/SpanBertCoref.md +++ b/docs/en/transformer_entries/SpanBertCoref.md @@ -21,7 +21,7 @@ The default model is `"spanbert_base_coref"`, if no name is provided. For availa models please see the [Models Hub](https://sparknlp.org/models). **References:** -https://github.com/mandarjoshi90/coref +[https://github.com/mandarjoshi90/coref](https://github.com/mandarjoshi90/coref) {%- endcapture -%} {%- capture input_anno -%} diff --git a/docs/en/transformer_entries/SwinForImageClassification.md b/docs/en/transformer_entries/SwinForImageClassification.md index 23e09bf5b20d18..ce99fe27b4e564 100644 --- a/docs/en/transformer_entries/SwinForImageClassification.md +++ b/docs/en/transformer_entries/SwinForImageClassification.md @@ -28,7 +28,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Image+Classification). Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [SwinForImageClassificationTest](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/cv/SwinForImageClassificationTest.scala). diff --git a/docs/en/transformer_entries/T5Transformer.md b/docs/en/transformer_entries/T5Transformer.md index 1b2e0f9438fea7..6cc2baeec6e6ea 100644 --- a/docs/en/transformer_entries/T5Transformer.md +++ b/docs/en/transformer_entries/T5Transformer.md @@ -28,7 +28,7 @@ and the [T5TestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/t **Sources:** - [Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) - [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) - - https://github.com/google-research/text-to-text-transfer-transformer + - [https://github.com/google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer) **Paper Abstract:** diff --git a/docs/en/transformer_entries/UniversalSentenceEncoder.md b/docs/en/transformer_entries/UniversalSentenceEncoder.md index ff45d0c48f151b..d7065ba30bd800 100644 --- a/docs/en/transformer_entries/UniversalSentenceEncoder.md +++ b/docs/en/transformer_entries/UniversalSentenceEncoder.md @@ -21,7 +21,7 @@ and the [UniversalSentenceEncoderTestSpec](https://github.com/JohnSnowLabs/spark [Universal Sentence Encoder](https://arxiv.org/abs/1803.11175) -https://tfhub.dev/google/universal-sentence-encoder/2 +[https://tfhub.dev/google/universal-sentence-encoder/2](https://tfhub.dev/google/universal-sentence-encoder/2) **Paper abstract:** diff --git a/docs/en/transformer_entries/ViTForImageClassification.md b/docs/en/transformer_entries/ViTForImageClassification.md index f482e2064ba629..d59e15a165b1f0 100644 --- a/docs/en/transformer_entries/ViTForImageClassification.md +++ b/docs/en/transformer_entries/ViTForImageClassification.md @@ -20,7 +20,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Image+Classification). Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [ViTImageClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/cv/ViTImageClassificationTestSpec.scala). diff --git a/docs/en/transformer_entries/VisionEncoderDecoderForImageCaptioning.md b/docs/en/transformer_entries/VisionEncoderDecoderForImageCaptioning.md index 8c0de07ee896c3..a310c3cce9c5f3 100644 --- a/docs/en/transformer_entries/VisionEncoderDecoderForImageCaptioning.md +++ b/docs/en/transformer_entries/VisionEncoderDecoderForImageCaptioning.md @@ -22,7 +22,7 @@ For available pretrained models please see the Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [VisionEncoderDecoderTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/cv/VisionEncoderDecoderTestSpec.scala). diff --git a/docs/en/transformer_entries/Wav2Vec2ForCTC.md b/docs/en/transformer_entries/Wav2Vec2ForCTC.md index f4b1e4d8f1b1dd..6a232e839c4790 100644 --- a/docs/en/transformer_entries/Wav2Vec2ForCTC.md +++ b/docs/en/transformer_entries/Wav2Vec2ForCTC.md @@ -25,7 +25,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [Wav2Vec2ForCTCTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/audio/Wav2Vec2ForCTCTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/WhisperForCTC.md b/docs/en/transformer_entries/WhisperForCTC.md index ed07efab8ce0e1..65073b531fb79a 100644 --- a/docs/en/transformer_entries/WhisperForCTC.md +++ b/docs/en/transformer_entries/WhisperForCTC.md @@ -31,7 +31,7 @@ The default model is `"asr_whisper_tiny_opt"`, if no name is provided. For available pretrained models please see the [Models Hub](https://sparknlp.org/models). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) and to see more extended examples, see [WhisperForCTCTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/audio/WhisperForCTCTest.scala). diff --git a/docs/en/transformer_entries/XlmRoBertaForQuestionAnswering.md b/docs/en/transformer_entries/XlmRoBertaForQuestionAnswering.md index 1d6382a8a74057..82d1957b425dcc 100644 --- a/docs/en/transformer_entries/XlmRoBertaForQuestionAnswering.md +++ b/docs/en/transformer_entries/XlmRoBertaForQuestionAnswering.md @@ -19,7 +19,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Question+Answering). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [XlmRoBertaForQuestionAnsweringTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/XlmRoBertaForQuestionAnsweringTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/XlmRoBertaForSequenceClassification.md b/docs/en/transformer_entries/XlmRoBertaForSequenceClassification.md index 5a034f3726f5e9..b9c4e35cedf318 100644 --- a/docs/en/transformer_entries/XlmRoBertaForSequenceClassification.md +++ b/docs/en/transformer_entries/XlmRoBertaForSequenceClassification.md @@ -19,7 +19,7 @@ The default model is `"xlm_roberta_base_sequence_classifier_imdb"`, if no name i For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are -compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +compatible and how to import them see [https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [XlmRoBertaForSequenceClassification](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/XlmRoBertaForSequenceClassificationTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformer_entries/XlmRoBertaForZeroShotClassification.md b/docs/en/transformer_entries/XlmRoBertaForZeroShotClassification.md index 55cef300c7e6ae..df996477082a2d 100644 --- a/docs/en/transformer_entries/XlmRoBertaForZeroShotClassification.md +++ b/docs/en/transformer_entries/XlmRoBertaForZeroShotClassification.md @@ -29,7 +29,7 @@ For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). To see which models are compatible and how to import them see -https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +[https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). {%- endcapture -%} {%- capture input_anno -%} diff --git a/docs/en/transformer_entries/XlnetEmbeddings.md b/docs/en/transformer_entries/XlnetEmbeddings.md index f711892642a87c..aca532a5445c73 100644 --- a/docs/en/transformer_entries/XlnetEmbeddings.md +++ b/docs/en/transformer_entries/XlnetEmbeddings.md @@ -16,6 +16,7 @@ These word embeddings represent the outputs generated by the XLNet models. Note that this is a very computationally expensive module compared to word embedding modules that only perform embedding lookups. The use of an accelerator is recommended. +{:.table-model-big} | Spark NLP Model | Google Model | Model Properties | | ---------------------- | ---------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- | | `"xlnet_large_cased"` | [XLNet-Large](https://storage.googleapis.com/xlnet/released_models/cased_L-24_H-1024_A-16.zip) | 24-layer, 1024-hidden, 16-heads | @@ -41,7 +42,7 @@ and the [XlnetEmbeddingsTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) -https://github.com/zihangdai/xlnet +[https://github.com/zihangdai/xlnet](https://github.com/zihangdai/xlnet) **Paper abstract:** diff --git a/docs/en/transformer_entries/XlnetForSequenceClassification.md b/docs/en/transformer_entries/XlnetForSequenceClassification.md index 892d57430893b5..b3f4d37359b52e 100644 --- a/docs/en/transformer_entries/XlnetForSequenceClassification.md +++ b/docs/en/transformer_entries/XlnetForSequenceClassification.md @@ -17,7 +17,7 @@ The default model is `"xlnet_base_sequence_classifier_imdb"`, if no name is prov For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Text+Classification). Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are -compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. +compatible and how to import them see [https://github.com/JohnSnowLabs/spark-nlp/discussions/5669](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669). and the [XlnetForSequenceClassification](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/XlnetForSequenceClassificationTestSpec.scala). {%- endcapture -%} diff --git a/docs/en/transformers.md b/docs/en/transformers.md index e17cf3de529afc..fb8e07d8407f11 100644 --- a/docs/en/transformers.md +++ b/docs/en/transformers.md @@ -47,6 +47,7 @@ We have extended support for `HuggingFace` 🤗 and `TF Hub` exported models sin - Under development ❎ - Not supported ❌ +{:.table-model-big} | Spark NLP | TF Hub | HuggingFace | ONNX | Model Architecture | | :-------------------------------------------- | :----- | :---------- | :--- | :--------------------------------------------------------------------------------------------------------------------------------------------------------- | | AlbertEmbeddings | ✅ | ✅ | ✅ | ALBERT | @@ -118,6 +119,7 @@ We have extended support for `HuggingFace` 🤗 and `TF Hub` exported models sin #### HuggingFace, Optimum, PyTorch, and ONNX Runtime to Spark NLP (ONNX) +{:.table-model-big} | Spark NLP | Notebooks | Colab | | :---------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | AlbertForQuestionAnswering | [HuggingFace ONNX in Spark NLP AlbertForQuestionAnswering](https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/onnx/HuggingFace_ONNX_in_Spark_NLP_AlbertForQuestionAnswering.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/onnx/HuggingFace_ONNX_in_Spark_NLP_AlbertForQuestionAnswering.ipynb) | @@ -149,6 +151,7 @@ We have extended support for `HuggingFace` 🤗 and `TF Hub` exported models sin #### HuggingFace to Spark NLP (TensorFlow) +{:.table-model-big} | Spark NLP | Notebooks | Colab | | :---------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | AlbertEmbeddings | [HuggingFace in Spark NLP - ALBERT](https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/HuggingFace%20in%20Spark%20NLP%20-%20ALBERT.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/HuggingFace%20in%20Spark%20NLP%20-%20ALBERT.ipynb) | @@ -198,6 +201,7 @@ We have extended support for `HuggingFace` 🤗 and `TF Hub` exported models sin #### TF Hub to Spark NLP +{:.table-model-big} | Spark NLP | TF Hub Notebooks | Colab | | :--------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | AlbertEmbeddings | [TF Hub in Spark NLP - ALBERT](https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/TF%20Hub%20in%20Spark%20NLP%20-%20ALBERT.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/TF%20Hub%20in%20Spark%20NLP%20-%20ALBERT.ipynb) | diff --git a/docs/index.md b/docs/index.md index 154824a38a2398..729f27cdc76b84 100644 --- a/docs/index.md +++ b/docs/index.md @@ -362,7 +362,7 @@ data: is_row: true - title: image: - src: https://u-paris.fr/wp-content/uploads/2019/03/Universite_Paris_logo_horizontal.jpg + src: https://u-paris.fr/wp-content/uploads/2022/03/Universite_Paris-Cite-logo.jpeg url: https://u-paris.fr/en/ style: "" is_row: true