diff --git a/docs/_posts/ahmedlone127/2024-09-02-bulbert_chitanka_model_bg.md b/docs/_posts/ahmedlone127/2024-09-02-bulbert_chitanka_model_bg.md new file mode 100644 index 00000000000000..bc57b57b03cc8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-02-bulbert_chitanka_model_bg.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bulgarian bulbert_chitanka_model BertEmbeddings from mor40 +author: John Snow Labs +name: bulbert_chitanka_model +date: 2024-09-02 +tags: [bg, open_source, onnx, embeddings, bert] +task: Embeddings +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bulbert_chitanka_model` is a Bulgarian model originally trained by mor40. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bulbert_chitanka_model_bg_5.5.0_3.0_1725318518639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bulbert_chitanka_model_bg_5.5.0_3.0_1725318518639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bulbert_chitanka_model","bg") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bulbert_chitanka_model","bg") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bulbert_chitanka_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|bg| +|Size:|306.1 MB| + +## References + +https://huggingface.co/mor40/BulBERT-chitanka-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-03-oorito_en.md b/docs/_posts/ahmedlone127/2024-09-03-oorito_en.md new file mode 100644 index 00000000000000..dc430c3cad5616 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-03-oorito_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English oorito MarianTransformer from LRJ1981 +author: John Snow Labs +name: oorito +date: 2024-09-03 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`oorito` is a English model originally trained by LRJ1981. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/oorito_en_5.5.0_3.0_1725404166090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/oorito_en_5.5.0_3.0_1725404166090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("oorito","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("oorito","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|oorito| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|504.7 MB| + +## References + +https://huggingface.co/LRJ1981/OORito \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-04-deberta_v3_large_survey_nepal_bhasa_fact_main_passage_rater_gpt4_en.md b/docs/_posts/ahmedlone127/2024-09-04-deberta_v3_large_survey_nepal_bhasa_fact_main_passage_rater_gpt4_en.md new file mode 100644 index 00000000000000..5383bf3f082b18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-04-deberta_v3_large_survey_nepal_bhasa_fact_main_passage_rater_gpt4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_large_survey_nepal_bhasa_fact_main_passage_rater_gpt4 DeBertaForSequenceClassification from domenicrosati +author: John Snow Labs +name: deberta_v3_large_survey_nepal_bhasa_fact_main_passage_rater_gpt4 +date: 2024-09-04 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_large_survey_nepal_bhasa_fact_main_passage_rater_gpt4` is a English model originally trained by domenicrosati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_large_survey_nepal_bhasa_fact_main_passage_rater_gpt4_en_5.5.0_3.0_1725440063827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_large_survey_nepal_bhasa_fact_main_passage_rater_gpt4_en_5.5.0_3.0_1725440063827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_large_survey_nepal_bhasa_fact_main_passage_rater_gpt4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_large_survey_nepal_bhasa_fact_main_passage_rater_gpt4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_large_survey_nepal_bhasa_fact_main_passage_rater_gpt4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/domenicrosati/deberta-v3-large-survey-new_fact_main_passage-rater-gpt4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-04-dummy_model_umalakshmi07_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-04-dummy_model_umalakshmi07_pipeline_en.md new file mode 100644 index 00000000000000..b1d0e2a6bf51f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-04-dummy_model_umalakshmi07_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dummy_model_umalakshmi07_pipeline pipeline CamemBertEmbeddings from Umalakshmi07 +author: John Snow Labs +name: dummy_model_umalakshmi07_pipeline +date: 2024-09-04 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_model_umalakshmi07_pipeline` is a English model originally trained by Umalakshmi07. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_model_umalakshmi07_pipeline_en_5.5.0_3.0_1725409109729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_model_umalakshmi07_pipeline_en_5.5.0_3.0_1725409109729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dummy_model_umalakshmi07_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dummy_model_umalakshmi07_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_model_umalakshmi07_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/Umalakshmi07/dummy-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- CamemBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-04-roberta_large_bne_sqac_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-04-roberta_large_bne_sqac_pipeline_es.md new file mode 100644 index 00000000000000..fdc072dadb9ffb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-04-roberta_large_bne_sqac_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish roberta_large_bne_sqac_pipeline pipeline RoBertaForQuestionAnswering from BSC-LT +author: John Snow Labs +name: roberta_large_bne_sqac_pipeline +date: 2024-09-04 +tags: [es, open_source, pipeline, onnx] +task: Question Answering +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_bne_sqac_pipeline` is a Castilian, Spanish model originally trained by BSC-LT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_bne_sqac_pipeline_es_5.5.0_3.0_1725480000706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_bne_sqac_pipeline_es_5.5.0_3.0_1725480000706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_bne_sqac_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_bne_sqac_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_bne_sqac_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.3 GB| + +## References + +https://huggingface.co/BSC-LT/roberta-large-bne-sqac + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-ae_detection_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-05-ae_detection_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..1c0ad251eb1f44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-ae_detection_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ae_detection_distilbert_pipeline pipeline DistilBertForTokenClassification from merlynjoseph +author: John Snow Labs +name: ae_detection_distilbert_pipeline +date: 2024-09-05 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ae_detection_distilbert_pipeline` is a English model originally trained by merlynjoseph. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ae_detection_distilbert_pipeline_en_5.5.0_3.0_1725518567094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ae_detection_distilbert_pipeline_en_5.5.0_3.0_1725518567094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ae_detection_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ae_detection_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ae_detection_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/merlynjoseph/AE-detection-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_french_squad_en.md b/docs/_posts/ahmedlone127/2024-09-05-babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_french_squad_en.md new file mode 100644 index 00000000000000..842cbb1e737df2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_french_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_french_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_french_squad +date: 2024-09-05 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_french_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_french_squad_en_5.5.0_3.0_1725576497787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_french_squad_en_5.5.0_3.0_1725576497787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_french_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_french_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_french_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-wikipedia1_1.25M_wikipedia_french1.25M-with-Masking-finetuned-french-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-candle_cvss_scope_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-05-candle_cvss_scope_pipeline_en.md new file mode 100644 index 00000000000000..eaebf03e465ab0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-candle_cvss_scope_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English candle_cvss_scope_pipeline pipeline MPNetForSequenceClassification from iashour +author: John Snow Labs +name: candle_cvss_scope_pipeline +date: 2024-09-05 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`candle_cvss_scope_pipeline` is a English model originally trained by iashour. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/candle_cvss_scope_pipeline_en_5.5.0_3.0_1725575348236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/candle_cvss_scope_pipeline_en_5.5.0_3.0_1725575348236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("candle_cvss_scope_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("candle_cvss_scope_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|candle_cvss_scope_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/iashour/CANDLE_cvss_scope + +## Included Models + +- DocumentAssembler +- TokenizerModel +- MPNetForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-ditransformersert_base_uncased_tokenclassification_lora_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-05-ditransformersert_base_uncased_tokenclassification_lora_pipeline_en.md new file mode 100644 index 00000000000000..70a25d788b761a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-ditransformersert_base_uncased_tokenclassification_lora_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ditransformersert_base_uncased_tokenclassification_lora_pipeline pipeline DistilBertForTokenClassification from urisoo +author: John Snow Labs +name: ditransformersert_base_uncased_tokenclassification_lora_pipeline +date: 2024-09-05 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ditransformersert_base_uncased_tokenclassification_lora_pipeline` is a English model originally trained by urisoo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ditransformersert_base_uncased_tokenclassification_lora_pipeline_en_5.5.0_3.0_1725506157055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ditransformersert_base_uncased_tokenclassification_lora_pipeline_en_5.5.0_3.0_1725506157055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ditransformersert_base_uncased_tokenclassification_lora_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ditransformersert_base_uncased_tokenclassification_lora_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ditransformersert_base_uncased_tokenclassification_lora_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/urisoo/ditransformersert-base-uncased-tokenclassification_lora + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-m8_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-05-m8_mlm_en.md new file mode 100644 index 00000000000000..9cff7febcb076d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-m8_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English m8_mlm AlbertEmbeddings from S2312dal +author: John Snow Labs +name: m8_mlm +date: 2024-09-05 +tags: [en, open_source, onnx, embeddings, albert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`m8_mlm` is a English model originally trained by S2312dal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/m8_mlm_en_5.5.0_3.0_1725528242291.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/m8_mlm_en_5.5.0_3.0_1725528242291.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = AlbertEmbeddings.pretrained("m8_mlm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = AlbertEmbeddings.pretrained("m8_mlm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|m8_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence, token]| +|Output Labels:|[albert]| +|Language:|en| +|Size:|42.0 MB| + +## References + +https://huggingface.co/S2312dal/M8_MLM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-nli_sentence_contradiction_classification_en.md b/docs/_posts/ahmedlone127/2024-09-05-nli_sentence_contradiction_classification_en.md new file mode 100644 index 00000000000000..b5e072ec00b0a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-nli_sentence_contradiction_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nli_sentence_contradiction_classification MPNetForSequenceClassification from mansee +author: John Snow Labs +name: nli_sentence_contradiction_classification +date: 2024-09-05 +tags: [en, open_source, onnx, sequence_classification, mpnet] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nli_sentence_contradiction_classification` is a English model originally trained by mansee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nli_sentence_contradiction_classification_en_5.5.0_3.0_1725575597188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nli_sentence_contradiction_classification_en_5.5.0_3.0_1725575597188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = MPNetForSequenceClassification.pretrained("nli_sentence_contradiction_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = MPNetForSequenceClassification.pretrained("nli_sentence_contradiction_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nli_sentence_contradiction_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/mansee/nli-sentence-contradiction-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-question_answering_xlm_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-05-question_answering_xlm_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..44e4ceef13d39b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-question_answering_xlm_roberta_base_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English question_answering_xlm_roberta_base_pipeline pipeline XlmRoBertaForQuestionAnswering from TunahanGokcimen +author: John Snow Labs +name: question_answering_xlm_roberta_base_pipeline +date: 2024-09-05 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_answering_xlm_roberta_base_pipeline` is a English model originally trained by TunahanGokcimen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_answering_xlm_roberta_base_pipeline_en_5.5.0_3.0_1725567657530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_answering_xlm_roberta_base_pipeline_en_5.5.0_3.0_1725567657530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("question_answering_xlm_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("question_answering_xlm_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_answering_xlm_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.0 MB| + +## References + +https://huggingface.co/TunahanGokcimen/Question-Answering-xlm-roberta-base + +## Included Models + +- MultiDocumentAssembler +- XlmRoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-roberta_large_1802_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-05-roberta_large_1802_pipeline_en.md new file mode 100644 index 00000000000000..272a54b098878b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-roberta_large_1802_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_1802_pipeline pipeline RoBertaEmbeddings from JamesKim +author: John Snow Labs +name: roberta_large_1802_pipeline +date: 2024-09-05 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_1802_pipeline` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_1802_pipeline_en_5.5.0_3.0_1725578469476.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_1802_pipeline_en_5.5.0_3.0_1725578469476.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_1802_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_1802_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_1802_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JamesKim/roberta-large_1802 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-chuvash_validator_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-chuvash_validator_pipeline_en.md new file mode 100644 index 00000000000000..ba33326fb1a23d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-chuvash_validator_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English chuvash_validator_pipeline pipeline MPNetEmbeddings from ivanzidov +author: John Snow Labs +name: chuvash_validator_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chuvash_validator_pipeline` is a English model originally trained by ivanzidov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chuvash_validator_pipeline_en_5.5.0_3.0_1725595285060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chuvash_validator_pipeline_en_5.5.0_3.0_1725595285060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chuvash_validator_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chuvash_validator_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chuvash_validator_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/ivanzidov/cv-validator + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-distilbert_base_uncased_finetuned_imdb_lb100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-distilbert_base_uncased_finetuned_imdb_lb100_pipeline_en.md new file mode 100644 index 00000000000000..8d1c7d2e358268 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-distilbert_base_uncased_finetuned_imdb_lb100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_lb100_pipeline pipeline DistilBertEmbeddings from LB100 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_lb100_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_lb100_pipeline` is a English model originally trained by LB100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_lb100_pipeline_en_5.5.0_3.0_1725664700897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_lb100_pipeline_en_5.5.0_3.0_1725664700897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_lb100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_lb100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_lb100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/LB100/distilbert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-distilbert_base_uncased_finetuned_ner_cerastes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-distilbert_base_uncased_finetuned_ner_cerastes_pipeline_en.md new file mode 100644 index 00000000000000..71c4ca8ef48bf9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-distilbert_base_uncased_finetuned_ner_cerastes_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_ner_cerastes_pipeline pipeline DistilBertForTokenClassification from Cerastes +author: John Snow Labs +name: distilbert_base_uncased_finetuned_ner_cerastes_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_ner_cerastes_pipeline` is a English model originally trained by Cerastes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_ner_cerastes_pipeline_en_5.5.0_3.0_1725653414732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_ner_cerastes_pipeline_en_5.5.0_3.0_1725653414732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_ner_cerastes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_ner_cerastes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_ner_cerastes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Cerastes/distilbert-base-uncased_finetuned_ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-distilbert_qa_robustqa_baseline_02_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-distilbert_qa_robustqa_baseline_02_pipeline_en.md new file mode 100644 index 00000000000000..7a52c9c82ffd53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-distilbert_qa_robustqa_baseline_02_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_qa_robustqa_baseline_02_pipeline pipeline DistilBertForQuestionAnswering from leemii18 +author: John Snow Labs +name: distilbert_qa_robustqa_baseline_02_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_qa_robustqa_baseline_02_pipeline` is a English model originally trained by leemii18. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_qa_robustqa_baseline_02_pipeline_en_5.5.0_3.0_1725621798325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_qa_robustqa_baseline_02_pipeline_en_5.5.0_3.0_1725621798325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_qa_robustqa_baseline_02_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_qa_robustqa_baseline_02_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_qa_robustqa_baseline_02_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/leemii18/robustqa-baseline-02 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-dummy_model_fab7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-dummy_model_fab7_pipeline_en.md new file mode 100644 index 00000000000000..b71f8d7811f706 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-dummy_model_fab7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dummy_model_fab7_pipeline pipeline CamemBertEmbeddings from fab7 +author: John Snow Labs +name: dummy_model_fab7_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_model_fab7_pipeline` is a English model originally trained by fab7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_model_fab7_pipeline_en_5.5.0_3.0_1725632500895.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_model_fab7_pipeline_en_5.5.0_3.0_1725632500895.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dummy_model_fab7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dummy_model_fab7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_model_fab7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/fab7/dummy-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- CamemBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_chituyi_en.md b/docs/_posts/ahmedlone127/2024-09-06-opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_chituyi_en.md new file mode 100644 index 00000000000000..6d1f7d7e5cf3b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_chituyi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_chituyi MarianTransformer from Chituyi +author: John Snow Labs +name: opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_chituyi +date: 2024-09-06 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_chituyi` is a English model originally trained by Chituyi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_chituyi_en_5.5.0_3.0_1725635803514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_chituyi_en_5.5.0_3.0_1725635803514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_chituyi","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_chituyi","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_chituyi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|506.5 MB| + +## References + +https://huggingface.co/Chituyi/opus-mt-en-sw-finetuned-en-to-sw \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-task_implicit_task__model_deberta__aug_method_ri_en.md b/docs/_posts/ahmedlone127/2024-09-06-task_implicit_task__model_deberta__aug_method_ri_en.md new file mode 100644 index 00000000000000..fbbbb3a8d0e5fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-task_implicit_task__model_deberta__aug_method_ri_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English task_implicit_task__model_deberta__aug_method_ri DeBertaForSequenceClassification from BenjaminOcampo +author: John Snow Labs +name: task_implicit_task__model_deberta__aug_method_ri +date: 2024-09-06 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`task_implicit_task__model_deberta__aug_method_ri` is a English model originally trained by BenjaminOcampo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/task_implicit_task__model_deberta__aug_method_ri_en_5.5.0_3.0_1725588781563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/task_implicit_task__model_deberta__aug_method_ri_en_5.5.0_3.0_1725588781563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("task_implicit_task__model_deberta__aug_method_ri","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("task_implicit_task__model_deberta__aug_method_ri", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|task_implicit_task__model_deberta__aug_method_ri| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|607.4 MB| + +## References + +https://huggingface.co/BenjaminOcampo/task-implicit_task__model-deberta__aug_method-ri \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-xlm_roberta_base_finetuned_panx_german_ahmad_alismail_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-xlm_roberta_base_finetuned_panx_german_ahmad_alismail_pipeline_en.md new file mode 100644 index 00000000000000..9981e04b04fdf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-xlm_roberta_base_finetuned_panx_german_ahmad_alismail_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ahmad_alismail_pipeline pipeline XlmRoBertaForTokenClassification from ahmad-alismail +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ahmad_alismail_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ahmad_alismail_pipeline` is a English model originally trained by ahmad-alismail. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ahmad_alismail_pipeline_en_5.5.0_3.0_1725658387854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ahmad_alismail_pipeline_en_5.5.0_3.0_1725658387854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ahmad_alismail_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ahmad_alismail_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ahmad_alismail_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ahmad-alismail/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-all_mpnet_base_v2_lr_5e_7_margin_1_epoch_1_en.md b/docs/_posts/ahmedlone127/2024-09-07-all_mpnet_base_v2_lr_5e_7_margin_1_epoch_1_en.md new file mode 100644 index 00000000000000..8a6e23eb0a84b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-all_mpnet_base_v2_lr_5e_7_margin_1_epoch_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English all_mpnet_base_v2_lr_5e_7_margin_1_epoch_1 MPNetEmbeddings from luiz-and-robert-thesis +author: John Snow Labs +name: all_mpnet_base_v2_lr_5e_7_margin_1_epoch_1 +date: 2024-09-07 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_base_v2_lr_5e_7_margin_1_epoch_1` is a English model originally trained by luiz-and-robert-thesis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_lr_5e_7_margin_1_epoch_1_en_5.5.0_3.0_1725702873782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_lr_5e_7_margin_1_epoch_1_en_5.5.0_3.0_1725702873782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("all_mpnet_base_v2_lr_5e_7_margin_1_epoch_1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("all_mpnet_base_v2_lr_5e_7_margin_1_epoch_1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_base_v2_lr_5e_7_margin_1_epoch_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/luiz-and-robert-thesis/all-mpnet-base-v2-lr-5e-7-margin-1-epoch-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-angela_untranslated_punc_eval_en.md b/docs/_posts/ahmedlone127/2024-09-07-angela_untranslated_punc_eval_en.md new file mode 100644 index 00000000000000..9891bd3365671f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-angela_untranslated_punc_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_untranslated_punc_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_punc_eval +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_punc_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_punc_eval_en_5.5.0_3.0_1725704700534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_punc_eval_en_5.5.0_3.0_1725704700534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_punc_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_punc_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_punc_eval| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_punc_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-anus_wanus_panus_ranus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-anus_wanus_panus_ranus_pipeline_en.md new file mode 100644 index 00000000000000..f53b5c4dc5b2ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-anus_wanus_panus_ranus_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English anus_wanus_panus_ranus_pipeline pipeline DistilBertForSequenceClassification from namebobb +author: John Snow Labs +name: anus_wanus_panus_ranus_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`anus_wanus_panus_ranus_pipeline` is a English model originally trained by namebobb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/anus_wanus_panus_ranus_pipeline_en_5.5.0_3.0_1725674952365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/anus_wanus_panus_ranus_pipeline_en_5.5.0_3.0_1725674952365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("anus_wanus_panus_ranus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("anus_wanus_panus_ranus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|anus_wanus_panus_ranus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/namebobb/anus-wanus-panus-ranus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_model_mohsinshah_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_model_mohsinshah_en.md new file mode 100644 index 00000000000000..f7f2b4885e7cde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_model_mohsinshah_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_mohsinshah DistilBertForSequenceClassification from mohsinshah +author: John Snow Labs +name: burmese_awesome_model_mohsinshah +date: 2024-09-07 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_mohsinshah` is a English model originally trained by mohsinshah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_mohsinshah_en_5.5.0_3.0_1725675118509.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_mohsinshah_en_5.5.0_3.0_1725675118509.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_mohsinshah","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_mohsinshah", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_mohsinshah| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mohsinshah/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_qa_model_khadidja22_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_qa_model_khadidja22_pipeline_en.md new file mode 100644 index 00000000000000..54a48c09e79ada --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_qa_model_khadidja22_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_khadidja22_pipeline pipeline DistilBertForQuestionAnswering from Khadidja22 +author: John Snow Labs +name: burmese_awesome_qa_model_khadidja22_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_khadidja22_pipeline` is a English model originally trained by Khadidja22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_khadidja22_pipeline_en_5.5.0_3.0_1725727323860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_khadidja22_pipeline_en_5.5.0_3.0_1725727323860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_khadidja22_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_khadidja22_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_khadidja22_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Khadidja22/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_ner_model_estebanpedro_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_ner_model_estebanpedro_pipeline_en.md new file mode 100644 index 00000000000000..ed907676779caf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_ner_model_estebanpedro_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_ner_model_estebanpedro_pipeline pipeline DistilBertForTokenClassification from EstebanPedro +author: John Snow Labs +name: burmese_ner_model_estebanpedro_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_ner_model_estebanpedro_pipeline` is a English model originally trained by EstebanPedro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_ner_model_estebanpedro_pipeline_en_5.5.0_3.0_1725730094644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_ner_model_estebanpedro_pipeline_en_5.5.0_3.0_1725730094644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_ner_model_estebanpedro_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_ner_model_estebanpedro_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_ner_model_estebanpedro_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/EstebanPedro/my_ner_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-covid_qa_distillbert_en.md b/docs/_posts/ahmedlone127/2024-09-07-covid_qa_distillbert_en.md new file mode 100644 index 00000000000000..909fad0c6b15a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-covid_qa_distillbert_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English covid_qa_distillbert DistilBertForQuestionAnswering from shainahub +author: John Snow Labs +name: covid_qa_distillbert +date: 2024-09-07 +tags: [distilbert, en, open_source, question_answering, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_qa_distillbert` is a English model originally trained by shainahub. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_qa_distillbert_en_5.5.0_3.0_1725722640275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_qa_distillbert_en_5.5.0_3.0_1725722640275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + + +spanClassifier = DistilBertForQuestionAnswering.pretrained("covid_qa_distillbert","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([document_assembler, spanClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering + .pretrained("covid_qa_distillbert", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(document_assembler, spanClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_qa_distillbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +References + +https://huggingface.co/shainahub/covid_qa_distillbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-distilbert_base_uncased_finetuned_squad_carels_en.md b/docs/_posts/ahmedlone127/2024-09-07-distilbert_base_uncased_finetuned_squad_carels_en.md new file mode 100644 index 00000000000000..af013c5893d7c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-distilbert_base_uncased_finetuned_squad_carels_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_carels DistilBertForQuestionAnswering from CarelS +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_carels +date: 2024-09-07 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_carels` is a English model originally trained by CarelS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_carels_en_5.5.0_3.0_1725735877604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_carels_en_5.5.0_3.0_1725735877604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_carels","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_carels", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_carels| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/CarelS/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-distilbert_finetuned_ner_veronica1608_en.md b/docs/_posts/ahmedlone127/2024-09-07-distilbert_finetuned_ner_veronica1608_en.md new file mode 100644 index 00000000000000..3f965ad84914cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-distilbert_finetuned_ner_veronica1608_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_ner_veronica1608 DistilBertForTokenClassification from veronica1608 +author: John Snow Labs +name: distilbert_finetuned_ner_veronica1608 +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_ner_veronica1608` is a English model originally trained by veronica1608. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_ner_veronica1608_en_5.5.0_3.0_1725734330442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_ner_veronica1608_en_5.5.0_3.0_1725734330442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_finetuned_ner_veronica1608","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_finetuned_ner_veronica1608", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_ner_veronica1608| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/veronica1608/distilbert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-distillber_squadv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-distillber_squadv2_pipeline_en.md new file mode 100644 index 00000000000000..5db0bb2fd6e174 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-distillber_squadv2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distillber_squadv2_pipeline pipeline DistilBertForQuestionAnswering from hskfd +author: John Snow Labs +name: distillber_squadv2_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillber_squadv2_pipeline` is a English model originally trained by hskfd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillber_squadv2_pipeline_en_5.5.0_3.0_1725745965721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillber_squadv2_pipeline_en_5.5.0_3.0_1725745965721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distillber_squadv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distillber_squadv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillber_squadv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/hskfd/distillber-squadv2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-dummy_model_hanzhuo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-dummy_model_hanzhuo_pipeline_en.md new file mode 100644 index 00000000000000..8ee607aae5c145 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-dummy_model_hanzhuo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dummy_model_hanzhuo_pipeline pipeline CamemBertEmbeddings from hanzhuo +author: John Snow Labs +name: dummy_model_hanzhuo_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_model_hanzhuo_pipeline` is a English model originally trained by hanzhuo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_model_hanzhuo_pipeline_en_5.5.0_3.0_1725729041550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_model_hanzhuo_pipeline_en_5.5.0_3.0_1725729041550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dummy_model_hanzhuo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dummy_model_hanzhuo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_model_hanzhuo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/hanzhuo/dummy-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- CamemBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-esperberto_goumbalamm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-esperberto_goumbalamm_pipeline_en.md new file mode 100644 index 00000000000000..45164e7523ede5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-esperberto_goumbalamm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English esperberto_goumbalamm_pipeline pipeline RoBertaEmbeddings from goumbalamm +author: John Snow Labs +name: esperberto_goumbalamm_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`esperberto_goumbalamm_pipeline` is a English model originally trained by goumbalamm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/esperberto_goumbalamm_pipeline_en_5.5.0_3.0_1725678221597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/esperberto_goumbalamm_pipeline_en_5.5.0_3.0_1725678221597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("esperberto_goumbalamm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("esperberto_goumbalamm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|esperberto_goumbalamm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/goumbalamm/EsperBERTo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-hafez_bert_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-07-hafez_bert_pipeline_fa.md new file mode 100644 index 00000000000000..fb610f26412ae8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-hafez_bert_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian hafez_bert_pipeline pipeline BertEmbeddings from ViravirastSHZ +author: John Snow Labs +name: hafez_bert_pipeline +date: 2024-09-07 +tags: [fa, open_source, pipeline, onnx] +task: Embeddings +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hafez_bert_pipeline` is a Persian model originally trained by ViravirastSHZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hafez_bert_pipeline_fa_5.5.0_3.0_1725696850593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hafez_bert_pipeline_fa_5.5.0_3.0_1725696850593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hafez_bert_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hafez_bert_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hafez_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ViravirastSHZ/Hafez_Bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-hupd_distilroberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-hupd_distilroberta_base_pipeline_en.md new file mode 100644 index 00000000000000..69db343e62abee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-hupd_distilroberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hupd_distilroberta_base_pipeline pipeline RoBertaEmbeddings from HUPD +author: John Snow Labs +name: hupd_distilroberta_base_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hupd_distilroberta_base_pipeline` is a English model originally trained by HUPD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hupd_distilroberta_base_pipeline_en_5.5.0_3.0_1725716529638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hupd_distilroberta_base_pipeline_en_5.5.0_3.0_1725716529638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hupd_distilroberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hupd_distilroberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hupd_distilroberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/HUPD/hupd-distilroberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-idiom_xlm_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-07-idiom_xlm_roberta_en.md new file mode 100644 index 00000000000000..7a69f4510fa7ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-idiom_xlm_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English idiom_xlm_roberta XlmRoBertaForTokenClassification from imranraad +author: John Snow Labs +name: idiom_xlm_roberta +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`idiom_xlm_roberta` is a English model originally trained by imranraad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/idiom_xlm_roberta_en_5.5.0_3.0_1725688504675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/idiom_xlm_roberta_en_5.5.0_3.0_1725688504675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("idiom_xlm_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("idiom_xlm_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|idiom_xlm_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|803.7 MB| + +## References + +https://huggingface.co/imranraad/idiom-xlm-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-jcblaise_roberta_tagalog_base_ft_udpos213_top2lang_en.md b/docs/_posts/ahmedlone127/2024-09-07-jcblaise_roberta_tagalog_base_ft_udpos213_top2lang_en.md new file mode 100644 index 00000000000000..04f4dea9a9ff20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-jcblaise_roberta_tagalog_base_ft_udpos213_top2lang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jcblaise_roberta_tagalog_base_ft_udpos213_top2lang RoBertaForTokenClassification from katrinatan +author: John Snow Labs +name: jcblaise_roberta_tagalog_base_ft_udpos213_top2lang +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jcblaise_roberta_tagalog_base_ft_udpos213_top2lang` is a English model originally trained by katrinatan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jcblaise_roberta_tagalog_base_ft_udpos213_top2lang_en_5.5.0_3.0_1725721364058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jcblaise_roberta_tagalog_base_ft_udpos213_top2lang_en_5.5.0_3.0_1725721364058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("jcblaise_roberta_tagalog_base_ft_udpos213_top2lang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("jcblaise_roberta_tagalog_base_ft_udpos213_top2lang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jcblaise_roberta_tagalog_base_ft_udpos213_top2lang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/katrinatan/jcblaise-roberta-tagalog-base_ft_udpos213-top2lang \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-opus_ecolindo_best_loss_bleu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-opus_ecolindo_best_loss_bleu_pipeline_en.md new file mode 100644 index 00000000000000..2735bf8eed2a7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-opus_ecolindo_best_loss_bleu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_ecolindo_best_loss_bleu_pipeline pipeline MarianTransformer from yonathanstwn +author: John Snow Labs +name: opus_ecolindo_best_loss_bleu_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_ecolindo_best_loss_bleu_pipeline` is a English model originally trained by yonathanstwn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_ecolindo_best_loss_bleu_pipeline_en_5.5.0_3.0_1725740700891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_ecolindo_best_loss_bleu_pipeline_en_5.5.0_3.0_1725740700891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_ecolindo_best_loss_bleu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_ecolindo_best_loss_bleu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_ecolindo_best_loss_bleu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|482.5 MB| + +## References + +https://huggingface.co/yonathanstwn/opus-ecolindo-best-loss-bleu + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-opus_maltese_arabic_english_finetuned_src_tonga_tonga_islands_trg_testing_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-opus_maltese_arabic_english_finetuned_src_tonga_tonga_islands_trg_testing_pipeline_en.md new file mode 100644 index 00000000000000..0668477009bc7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-opus_maltese_arabic_english_finetuned_src_tonga_tonga_islands_trg_testing_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_arabic_english_finetuned_src_tonga_tonga_islands_trg_testing_pipeline pipeline MarianTransformer from s3h +author: John Snow Labs +name: opus_maltese_arabic_english_finetuned_src_tonga_tonga_islands_trg_testing_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_arabic_english_finetuned_src_tonga_tonga_islands_trg_testing_pipeline` is a English model originally trained by s3h. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_arabic_english_finetuned_src_tonga_tonga_islands_trg_testing_pipeline_en_5.5.0_3.0_1725740548093.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_arabic_english_finetuned_src_tonga_tonga_islands_trg_testing_pipeline_en_5.5.0_3.0_1725740548093.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_arabic_english_finetuned_src_tonga_tonga_islands_trg_testing_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_arabic_english_finetuned_src_tonga_tonga_islands_trg_testing_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_arabic_english_finetuned_src_tonga_tonga_islands_trg_testing_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|528.5 MB| + +## References + +https://huggingface.co/s3h/opus-mt-ar-en-finetuned-src-to-trg-testing + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-roberta_base_epoch_41_en.md b/docs/_posts/ahmedlone127/2024-09-07-roberta_base_epoch_41_en.md new file mode 100644 index 00000000000000..caa9191393929a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-roberta_base_epoch_41_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_41 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_41 +date: 2024-09-07 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_41` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_41_en_5.5.0_3.0_1725678767788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_41_en_5.5.0_3.0_1725678767788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_41","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_41","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_41| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_41 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-sent_bert_base_polish_uncased_v1_pipeline_pl.md b/docs/_posts/ahmedlone127/2024-09-07-sent_bert_base_polish_uncased_v1_pipeline_pl.md new file mode 100644 index 00000000000000..343d709d97b9c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-sent_bert_base_polish_uncased_v1_pipeline_pl.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Polish sent_bert_base_polish_uncased_v1_pipeline pipeline BertSentenceEmbeddings from dkleczek +author: John Snow Labs +name: sent_bert_base_polish_uncased_v1_pipeline +date: 2024-09-07 +tags: [pl, open_source, pipeline, onnx] +task: Embeddings +language: pl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_polish_uncased_v1_pipeline` is a Polish model originally trained by dkleczek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_polish_uncased_v1_pipeline_pl_5.5.0_3.0_1725700404934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_polish_uncased_v1_pipeline_pl_5.5.0_3.0_1725700404934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_polish_uncased_v1_pipeline", lang = "pl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_polish_uncased_v1_pipeline", lang = "pl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_polish_uncased_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pl| +|Size:|494.1 MB| + +## References + +https://huggingface.co/dkleczek/bert-base-polish-uncased-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-study_dummy_en.md b/docs/_posts/ahmedlone127/2024-09-07-study_dummy_en.md new file mode 100644 index 00000000000000..ab129b4241329b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-study_dummy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English study_dummy CamemBertEmbeddings from litlhope +author: John Snow Labs +name: study_dummy +date: 2024-09-07 +tags: [en, open_source, onnx, embeddings, camembert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`study_dummy` is a English model originally trained by litlhope. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/study_dummy_en_5.5.0_3.0_1725692260841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/study_dummy_en_5.5.0_3.0_1725692260841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = CamemBertEmbeddings.pretrained("study_dummy","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = CamemBertEmbeddings.pretrained("study_dummy","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|study_dummy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/litlhope/study-dummy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-weights_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-weights_pipeline_en.md new file mode 100644 index 00000000000000..ca333119181593 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-weights_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English weights_pipeline pipeline RoBertaEmbeddings from vsty +author: John Snow Labs +name: weights_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`weights_pipeline` is a English model originally trained by vsty. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/weights_pipeline_en_5.5.0_3.0_1725716850265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/weights_pipeline_en_5.5.0_3.0_1725716850265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("weights_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("weights_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|weights_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|275.1 MB| + +## References + +https://huggingface.co/vsty/weights + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-whisper_noisy_en.md b/docs/_posts/ahmedlone127/2024-09-07-whisper_noisy_en.md new file mode 100644 index 00000000000000..d956e7c34e8672 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-whisper_noisy_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_noisy WhisperForCTC from alxfng +author: John Snow Labs +name: whisper_noisy +date: 2024-09-07 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_noisy` is a English model originally trained by alxfng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_noisy_en_5.5.0_3.0_1725751456084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_noisy_en_5.5.0_3.0_1725751456084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_noisy","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_noisy", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_noisy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.5 MB| + +## References + +https://huggingface.co/alxfng/whisper-noisy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_final_vietnam_aug_insert_w2v_2_en.md b/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_final_vietnam_aug_insert_w2v_2_en.md new file mode 100644 index 00000000000000..acb391bb299b53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_final_vietnam_aug_insert_w2v_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_final_vietnam_aug_insert_w2v_2 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_vietnam_aug_insert_w2v_2 +date: 2024-09-07 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_vietnam_aug_insert_w2v_2` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_vietnam_aug_insert_w2v_2_en_5.5.0_3.0_1725669414119.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_vietnam_aug_insert_w2v_2_en_5.5.0_3.0_1725669414119.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_vietnam_aug_insert_w2v_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_vietnam_aug_insert_w2v_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_vietnam_aug_insert_w2v_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.9 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_VietNam-aug_insert_w2v-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_french_inniok_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_french_inniok_pipeline_en.md new file mode 100644 index 00000000000000..7ec13d728f1d80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_french_inniok_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_inniok_pipeline pipeline XlmRoBertaForTokenClassification from inniok +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_inniok_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_inniok_pipeline` is a English model originally trained by inniok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_inniok_pipeline_en_5.5.0_3.0_1725693856599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_inniok_pipeline_en_5.5.0_3.0_1725693856599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_inniok_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_inniok_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_inniok_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/inniok/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-xlmroberta_ner_tner_base_ontonotes5_en.md b/docs/_posts/ahmedlone127/2024-09-07-xlmroberta_ner_tner_base_ontonotes5_en.md new file mode 100644 index 00000000000000..8c3a3ab11b0615 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-xlmroberta_ner_tner_base_ontonotes5_en.md @@ -0,0 +1,113 @@ +--- +layout: model +title: English XLMRobertaForTokenClassification Base Cased model (from asahi417) +author: John Snow Labs +name: xlmroberta_ner_tner_base_ontonotes5 +date: 2024-09-07 +tags: [en, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `tner-xlm-roberta-base-ontonotes5` is a English model originally trained by `asahi417`. + +## Predicted Entities + +`language`, `product`, `percent`, `time`, `quantity`, `ordinal number`, `law`, `cardinal number`, `facility`, `event`, `geopolitical area`, `organization`, `group`, `money`, `work of art`, `person`, `location`, `date` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_tner_base_ontonotes5_en_5.5.0_3.0_1725689019834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_tner_base_ontonotes5_en_5.5.0_3.0_1725689019834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_tner_base_ontonotes5","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_tner_base_ontonotes5","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.xlmr_roberta.ontonotes5.base.by_asahi417").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_tner_base_ontonotes5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|798.0 MB| + +## References + +References + +- https://huggingface.co/asahi417/tner-xlm-roberta-base-ontonotes5 +- https://github.com/asahi417/tner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-xlmroberta_ner_xlm_roberta_base_finetuned_panx_ner_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-07-xlmroberta_ner_xlm_roberta_base_finetuned_panx_ner_pipeline_it.md new file mode 100644 index 00000000000000..930a0657bb08ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-xlmroberta_ner_xlm_roberta_base_finetuned_panx_ner_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian xlmroberta_ner_xlm_roberta_base_finetuned_panx_ner_pipeline pipeline XlmRoBertaForTokenClassification from gunghio +author: John Snow Labs +name: xlmroberta_ner_xlm_roberta_base_finetuned_panx_ner_pipeline +date: 2024-09-07 +tags: [it, open_source, pipeline, onnx] +task: Named Entity Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_xlm_roberta_base_finetuned_panx_ner_pipeline` is a Italian model originally trained by gunghio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_panx_ner_pipeline_it_5.5.0_3.0_1725743924536.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_panx_ner_pipeline_it_5.5.0_3.0_1725743924536.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_xlm_roberta_base_finetuned_panx_ner_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_xlm_roberta_base_finetuned_panx_ner_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xlm_roberta_base_finetuned_panx_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|877.6 MB| + +## References + +https://huggingface.co/gunghio/xlm-roberta-base-finetuned-panx-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-all_roberta_large_v1_meta_4_16_5_oos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-all_roberta_large_v1_meta_4_16_5_oos_pipeline_en.md new file mode 100644 index 00000000000000..fc2156532927fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-all_roberta_large_v1_meta_4_16_5_oos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_meta_4_16_5_oos_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_meta_4_16_5_oos_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_meta_4_16_5_oos_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_meta_4_16_5_oos_pipeline_en_5.5.0_3.0_1725830265458.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_meta_4_16_5_oos_pipeline_en_5.5.0_3.0_1725830265458.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_meta_4_16_5_oos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_meta_4_16_5_oos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_meta_4_16_5_oos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-meta-4-16-5-oos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-bert_polish_sentiment_politics_pl.md b/docs/_posts/ahmedlone127/2024-09-08-bert_polish_sentiment_politics_pl.md new file mode 100644 index 00000000000000..0f8f1d8fac8cb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-bert_polish_sentiment_politics_pl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Polish bert_polish_sentiment_politics BertForSequenceClassification from eevvgg +author: John Snow Labs +name: bert_polish_sentiment_politics +date: 2024-09-08 +tags: [pl, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: pl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_polish_sentiment_politics` is a Polish model originally trained by eevvgg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_polish_sentiment_politics_pl_5.5.0_3.0_1725838924216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_polish_sentiment_politics_pl_5.5.0_3.0_1725838924216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_polish_sentiment_politics","pl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_polish_sentiment_politics", "pl") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_polish_sentiment_politics| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|pl| +|Size:|495.8 MB| + +## References + +https://huggingface.co/eevvgg/bert-polish-sentiment-politics \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-biolinkbert_base_michiyasunaga_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-biolinkbert_base_michiyasunaga_pipeline_en.md new file mode 100644 index 00000000000000..38e5bf0ba91901 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-biolinkbert_base_michiyasunaga_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English biolinkbert_base_michiyasunaga_pipeline pipeline BertForSequenceClassification from michiyasunaga +author: John Snow Labs +name: biolinkbert_base_michiyasunaga_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biolinkbert_base_michiyasunaga_pipeline` is a English model originally trained by michiyasunaga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biolinkbert_base_michiyasunaga_pipeline_en_5.5.0_3.0_1725761245691.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biolinkbert_base_michiyasunaga_pipeline_en_5.5.0_3.0_1725761245691.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("biolinkbert_base_michiyasunaga_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("biolinkbert_base_michiyasunaga_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biolinkbert_base_michiyasunaga_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.7 MB| + +## References + +https://huggingface.co/michiyasunaga/BioLinkBERT-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_qa_model_wingpatil_en.md b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_qa_model_wingpatil_en.md new file mode 100644 index 00000000000000..1854f9865b3514 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_qa_model_wingpatil_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_wingpatil DistilBertForQuestionAnswering from WingPatil +author: John Snow Labs +name: burmese_awesome_qa_model_wingpatil +date: 2024-09-08 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_wingpatil` is a English model originally trained by WingPatil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_wingpatil_en_5.5.0_3.0_1725818222191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_wingpatil_en_5.5.0_3.0_1725818222191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_wingpatil","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_wingpatil", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_wingpatil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/WingPatil/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_qa_model_wingpatil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_qa_model_wingpatil_pipeline_en.md new file mode 100644 index 00000000000000..7f08dad77bbcdf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_qa_model_wingpatil_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_wingpatil_pipeline pipeline DistilBertForQuestionAnswering from WingPatil +author: John Snow Labs +name: burmese_awesome_qa_model_wingpatil_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_wingpatil_pipeline` is a English model originally trained by WingPatil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_wingpatil_pipeline_en_5.5.0_3.0_1725818234626.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_wingpatil_pipeline_en_5.5.0_3.0_1725818234626.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_wingpatil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_wingpatil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_wingpatil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/WingPatil/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_qa_model_xbotwovenware_en.md b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_qa_model_xbotwovenware_en.md new file mode 100644 index 00000000000000..be7baa54a187e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_qa_model_xbotwovenware_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_xbotwovenware DistilBertForQuestionAnswering from xbotwovenware +author: John Snow Labs +name: burmese_awesome_qa_model_xbotwovenware +date: 2024-09-08 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_xbotwovenware` is a English model originally trained by xbotwovenware. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_xbotwovenware_en_5.5.0_3.0_1725823193444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_xbotwovenware_en_5.5.0_3.0_1725823193444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_xbotwovenware","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_xbotwovenware", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_xbotwovenware| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/xbotwovenware/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_wnut_model_jaepax_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_wnut_model_jaepax_pipeline_en.md new file mode 100644 index 00000000000000..7da39af4484389 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_wnut_model_jaepax_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_jaepax_pipeline pipeline DistilBertForTokenClassification from JaepaX +author: John Snow Labs +name: burmese_awesome_wnut_model_jaepax_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_jaepax_pipeline` is a English model originally trained by JaepaX. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_jaepax_pipeline_en_5.5.0_3.0_1725837217305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_jaepax_pipeline_en_5.5.0_3.0_1725837217305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_model_jaepax_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_model_jaepax_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_jaepax_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/JaepaX/my_awesome_wnut_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_wnut_model_maniack_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_wnut_model_maniack_pipeline_en.md new file mode 100644 index 00000000000000..83f4e441e7b7d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_wnut_model_maniack_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_maniack_pipeline pipeline DistilBertForTokenClassification from maniack +author: John Snow Labs +name: burmese_awesome_wnut_model_maniack_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_maniack_pipeline` is a English model originally trained by maniack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_maniack_pipeline_en_5.5.0_3.0_1725788657994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_maniack_pipeline_en_5.5.0_3.0_1725788657994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_model_maniack_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_model_maniack_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_maniack_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/maniack/my_awesome_wnut_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-cnec2_0_supertypes_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-cnec2_0_supertypes_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..29e4e03072b773 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-cnec2_0_supertypes_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cnec2_0_supertypes_distilbert_pipeline pipeline DistilBertForTokenClassification from stulcrad +author: John Snow Labs +name: cnec2_0_supertypes_distilbert_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cnec2_0_supertypes_distilbert_pipeline` is a English model originally trained by stulcrad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cnec2_0_supertypes_distilbert_pipeline_en_5.5.0_3.0_1725837317362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cnec2_0_supertypes_distilbert_pipeline_en_5.5.0_3.0_1725837317362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cnec2_0_supertypes_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cnec2_0_supertypes_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cnec2_0_supertypes_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|505.5 MB| + +## References + +https://huggingface.co/stulcrad/CNEC2_0_Supertypes_distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-cross_encoder_russian_msmarco_ru.md b/docs/_posts/ahmedlone127/2024-09-08-cross_encoder_russian_msmarco_ru.md new file mode 100644 index 00000000000000..a18705a146527b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-cross_encoder_russian_msmarco_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian cross_encoder_russian_msmarco BertForSequenceClassification from DiTy +author: John Snow Labs +name: cross_encoder_russian_msmarco +date: 2024-09-08 +tags: [ru, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cross_encoder_russian_msmarco` is a Russian model originally trained by DiTy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cross_encoder_russian_msmarco_ru_5.5.0_3.0_1725802229812.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cross_encoder_russian_msmarco_ru_5.5.0_3.0_1725802229812.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cross_encoder_russian_msmarco","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cross_encoder_russian_msmarco", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cross_encoder_russian_msmarco| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ru| +|Size:|666.5 MB| + +## References + +https://huggingface.co/DiTy/cross-encoder-russian-msmarco \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-cw_ner_model_pipeline_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-cw_ner_model_pipeline_pipeline_en.md new file mode 100644 index 00000000000000..1f6ad46b0bd8e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-cw_ner_model_pipeline_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cw_ner_model_pipeline_pipeline pipeline DistilBertForTokenClassification from ArshiaKarimian +author: John Snow Labs +name: cw_ner_model_pipeline_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cw_ner_model_pipeline_pipeline` is a English model originally trained by ArshiaKarimian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cw_ner_model_pipeline_pipeline_en_5.5.0_3.0_1725828002224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cw_ner_model_pipeline_pipeline_en_5.5.0_3.0_1725828002224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cw_ner_model_pipeline_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cw_ner_model_pipeline_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cw_ner_model_pipeline_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/ArshiaKarimian/CW_NER_MODEL_PIPELINE + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-deberta_amazon_reviews_v1_bweb771_en.md b/docs/_posts/ahmedlone127/2024-09-08-deberta_amazon_reviews_v1_bweb771_en.md new file mode 100644 index 00000000000000..e7f6cc6e0a1aad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-deberta_amazon_reviews_v1_bweb771_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_amazon_reviews_v1_bweb771 DeBertaForSequenceClassification from bweb771 +author: John Snow Labs +name: deberta_amazon_reviews_v1_bweb771 +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_amazon_reviews_v1_bweb771` is a English model originally trained by bweb771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_amazon_reviews_v1_bweb771_en_5.5.0_3.0_1725804040284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_amazon_reviews_v1_bweb771_en_5.5.0_3.0_1725804040284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_amazon_reviews_v1_bweb771","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_amazon_reviews_v1_bweb771", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_amazon_reviews_v1_bweb771| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.4 MB| + +## References + +https://huggingface.co/bweb771/deberta_amazon_reviews_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-deberta_v3_large_finetuned_dagpap22_synthetic_all_overfit_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-deberta_v3_large_finetuned_dagpap22_synthetic_all_overfit_pipeline_en.md new file mode 100644 index 00000000000000..867678d7f0f235 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-deberta_v3_large_finetuned_dagpap22_synthetic_all_overfit_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_large_finetuned_dagpap22_synthetic_all_overfit_pipeline pipeline DeBertaForSequenceClassification from domenicrosati +author: John Snow Labs +name: deberta_v3_large_finetuned_dagpap22_synthetic_all_overfit_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_large_finetuned_dagpap22_synthetic_all_overfit_pipeline` is a English model originally trained by domenicrosati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_large_finetuned_dagpap22_synthetic_all_overfit_pipeline_en_5.5.0_3.0_1725803179187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_large_finetuned_dagpap22_synthetic_all_overfit_pipeline_en_5.5.0_3.0_1725803179187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_large_finetuned_dagpap22_synthetic_all_overfit_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_large_finetuned_dagpap22_synthetic_all_overfit_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_large_finetuned_dagpap22_synthetic_all_overfit_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/domenicrosati/deberta-v3-large-finetuned-DAGPap22-synthetic-all-overfit + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_ner_cnguyenta_en.md b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_ner_cnguyenta_en.md new file mode 100644 index 00000000000000..cdd048bcddf72a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_ner_cnguyenta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_ner_cnguyenta DistilBertForTokenClassification from cnguyenta +author: John Snow Labs +name: distilbert_base_uncased_finetuned_ner_cnguyenta +date: 2024-09-08 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_ner_cnguyenta` is a English model originally trained by cnguyenta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_ner_cnguyenta_en_5.5.0_3.0_1725837410078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_ner_cnguyenta_en_5.5.0_3.0_1725837410078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_finetuned_ner_cnguyenta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_finetuned_ner_cnguyenta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_ner_cnguyenta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/cnguyenta/distilbert-base-uncased-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1_plprefix0stlarge16_simsp400_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1_plprefix0stlarge16_simsp400_clean300_pipeline_en.md new file mode 100644 index 00000000000000..89ef13e6f54b8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1_plprefix0stlarge16_simsp400_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1_plprefix0stlarge16_simsp400_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1_plprefix0stlarge16_simsp400_clean300_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1_plprefix0stlarge16_simsp400_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1_plprefix0stlarge16_simsp400_clean300_pipeline_en_5.5.0_3.0_1725775392717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1_plprefix0stlarge16_simsp400_clean300_pipeline_en_5.5.0_3.0_1725775392717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1_plprefix0stlarge16_simsp400_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1_plprefix0stlarge16_simsp400_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1_plprefix0stlarge16_simsp400_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut1_PLPrefix0stlarge16_simsp400_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-distilbert_finetuned_squadv2_zoanhy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-distilbert_finetuned_squadv2_zoanhy_pipeline_en.md new file mode 100644 index 00000000000000..50c4e42b16d7b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-distilbert_finetuned_squadv2_zoanhy_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_finetuned_squadv2_zoanhy_pipeline pipeline DistilBertForQuestionAnswering from zoanhy +author: John Snow Labs +name: distilbert_finetuned_squadv2_zoanhy_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squadv2_zoanhy_pipeline` is a English model originally trained by zoanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_zoanhy_pipeline_en_5.5.0_3.0_1725818540318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_zoanhy_pipeline_en_5.5.0_3.0_1725818540318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_squadv2_zoanhy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_squadv2_zoanhy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squadv2_zoanhy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/zoanhy/distilbert-finetuned-squadv2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-dock_2_en.md b/docs/_posts/ahmedlone127/2024-09-08-dock_2_en.md new file mode 100644 index 00000000000000..d0cc4d6d1cc7b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-dock_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dock_2 RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: dock_2 +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dock_2` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dock_2_en_5.5.0_3.0_1725820972714.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dock_2_en_5.5.0_3.0_1725820972714.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("dock_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("dock_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dock_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Dock_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-has_the_doctor_specified_whether_the_patient_can_belarusian_seen_heard_bert_first512_en.md b/docs/_posts/ahmedlone127/2024-09-08-has_the_doctor_specified_whether_the_patient_can_belarusian_seen_heard_bert_first512_en.md new file mode 100644 index 00000000000000..d1192e19f1e438 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-has_the_doctor_specified_whether_the_patient_can_belarusian_seen_heard_bert_first512_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English has_the_doctor_specified_whether_the_patient_can_belarusian_seen_heard_bert_first512 BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: has_the_doctor_specified_whether_the_patient_can_belarusian_seen_heard_bert_first512 +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`has_the_doctor_specified_whether_the_patient_can_belarusian_seen_heard_bert_first512` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/has_the_doctor_specified_whether_the_patient_can_belarusian_seen_heard_bert_first512_en_5.5.0_3.0_1725768500772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/has_the_doctor_specified_whether_the_patient_can_belarusian_seen_heard_bert_first512_en_5.5.0_3.0_1725768500772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("has_the_doctor_specified_whether_the_patient_can_belarusian_seen_heard_bert_first512","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("has_the_doctor_specified_whether_the_patient_can_belarusian_seen_heard_bert_first512", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|has_the_doctor_specified_whether_the_patient_can_belarusian_seen_heard_bert_first512| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/etadevosyan/has_the_doctor_specified_whether_the_patient_can_be_seen_heard_bert_First512 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-helsinki_multiple_languages_english_test_en.md b/docs/_posts/ahmedlone127/2024-09-08-helsinki_multiple_languages_english_test_en.md new file mode 100644 index 00000000000000..4d9040e9408c01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-helsinki_multiple_languages_english_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English helsinki_multiple_languages_english_test MarianTransformer from Shularp +author: John Snow Labs +name: helsinki_multiple_languages_english_test +date: 2024-09-08 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helsinki_multiple_languages_english_test` is a English model originally trained by Shularp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helsinki_multiple_languages_english_test_en_5.5.0_3.0_1725824655561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helsinki_multiple_languages_english_test_en_5.5.0_3.0_1725824655561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("helsinki_multiple_languages_english_test","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("helsinki_multiple_languages_english_test","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helsinki_multiple_languages_english_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|532.4 MB| + +## References + +https://huggingface.co/Shularp/Helsinki_mul-en_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-intent_classifier_frana9812_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-intent_classifier_frana9812_pipeline_en.md new file mode 100644 index 00000000000000..be03f728224770 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-intent_classifier_frana9812_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English intent_classifier_frana9812_pipeline pipeline DistilBertForSequenceClassification from Frana9812 +author: John Snow Labs +name: intent_classifier_frana9812_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`intent_classifier_frana9812_pipeline` is a English model originally trained by Frana9812. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/intent_classifier_frana9812_pipeline_en_5.5.0_3.0_1725774870205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/intent_classifier_frana9812_pipeline_en_5.5.0_3.0_1725774870205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("intent_classifier_frana9812_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("intent_classifier_frana9812_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|intent_classifier_frana9812_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/Frana9812/intent_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-invited_the_patient_tonga_tonga_islands_a_dialogue_bert_first128_en.md b/docs/_posts/ahmedlone127/2024-09-08-invited_the_patient_tonga_tonga_islands_a_dialogue_bert_first128_en.md new file mode 100644 index 00000000000000..358b2d928b4bdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-invited_the_patient_tonga_tonga_islands_a_dialogue_bert_first128_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English invited_the_patient_tonga_tonga_islands_a_dialogue_bert_first128 BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: invited_the_patient_tonga_tonga_islands_a_dialogue_bert_first128 +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`invited_the_patient_tonga_tonga_islands_a_dialogue_bert_first128` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/invited_the_patient_tonga_tonga_islands_a_dialogue_bert_first128_en_5.5.0_3.0_1725768285726.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/invited_the_patient_tonga_tonga_islands_a_dialogue_bert_first128_en_5.5.0_3.0_1725768285726.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("invited_the_patient_tonga_tonga_islands_a_dialogue_bert_first128","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("invited_the_patient_tonga_tonga_islands_a_dialogue_bert_first128", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|invited_the_patient_tonga_tonga_islands_a_dialogue_bert_first128| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/etadevosyan/invited_the_patient_to_a_dialogue_bert_First128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-lab1_random_chenxin0903_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-lab1_random_chenxin0903_pipeline_en.md new file mode 100644 index 00000000000000..40fb927d72f1e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-lab1_random_chenxin0903_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab1_random_chenxin0903_pipeline pipeline MarianTransformer from Chenxin0903 +author: John Snow Labs +name: lab1_random_chenxin0903_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_random_chenxin0903_pipeline` is a English model originally trained by Chenxin0903. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_random_chenxin0903_pipeline_en_5.5.0_3.0_1725824732042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_random_chenxin0903_pipeline_en_5.5.0_3.0_1725824732042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab1_random_chenxin0903_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab1_random_chenxin0903_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_random_chenxin0903_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|510.3 MB| + +## References + +https://huggingface.co/Chenxin0903/lab1_random + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-lab1_random_den_sota_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-lab1_random_den_sota_pipeline_en.md new file mode 100644 index 00000000000000..5f783328b6040e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-lab1_random_den_sota_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab1_random_den_sota_pipeline pipeline MarianTransformer from Den-sota +author: John Snow Labs +name: lab1_random_den_sota_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_random_den_sota_pipeline` is a English model originally trained by Den-sota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_random_den_sota_pipeline_en_5.5.0_3.0_1725824342058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_random_den_sota_pipeline_en_5.5.0_3.0_1725824342058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab1_random_den_sota_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab1_random_den_sota_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_random_den_sota_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.8 MB| + +## References + +https://huggingface.co/Den-sota/lab1_random + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-lenu_us_chichewa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-lenu_us_chichewa_pipeline_en.md new file mode 100644 index 00000000000000..773ac03377704b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-lenu_us_chichewa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lenu_us_chichewa_pipeline pipeline BertForSequenceClassification from Sociovestix +author: John Snow Labs +name: lenu_us_chichewa_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenu_us_chichewa_pipeline` is a English model originally trained by Sociovestix. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenu_us_chichewa_pipeline_en_5.5.0_3.0_1725819023062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenu_us_chichewa_pipeline_en_5.5.0_3.0_1725819023062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lenu_us_chichewa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lenu_us_chichewa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenu_us_chichewa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Sociovestix/lenu_US-NY + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-ner_bert_model_en.md b/docs/_posts/ahmedlone127/2024-09-08-ner_bert_model_en.md new file mode 100644 index 00000000000000..cd833ec4b11fea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-ner_bert_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_bert_model DistilBertForTokenClassification from harsh13333 +author: John Snow Labs +name: ner_bert_model +date: 2024-09-08 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_bert_model` is a English model originally trained by harsh13333. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_bert_model_en_5.5.0_3.0_1725788733810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_bert_model_en_5.5.0_3.0_1725788733810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("ner_bert_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("ner_bert_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_bert_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/harsh13333/ner_bert_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-ner_bert_uncased_model_en.md b/docs/_posts/ahmedlone127/2024-09-08-ner_bert_uncased_model_en.md new file mode 100644 index 00000000000000..838f057116ba63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-ner_bert_uncased_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_bert_uncased_model DistilBertForTokenClassification from pritmanvar +author: John Snow Labs +name: ner_bert_uncased_model +date: 2024-09-08 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_bert_uncased_model` is a English model originally trained by pritmanvar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_bert_uncased_model_en_5.5.0_3.0_1725788474802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_bert_uncased_model_en_5.5.0_3.0_1725788474802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("ner_bert_uncased_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("ner_bert_uncased_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_bert_uncased_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/pritmanvar/ner_bert_uncased_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-ner_multilingualbert_xx.md b/docs/_posts/ahmedlone127/2024-09-08-ner_multilingualbert_xx.md new file mode 100644 index 00000000000000..9c8393213bff3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-ner_multilingualbert_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual ner_multilingualbert BertForTokenClassification from DrRinS +author: John Snow Labs +name: ner_multilingualbert +date: 2024-09-08 +tags: [xx, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_multilingualbert` is a Multilingual model originally trained by DrRinS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_multilingualbert_xx_5.5.0_3.0_1725834749104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_multilingualbert_xx_5.5.0_3.0_1725834749104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_multilingualbert","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_multilingualbert", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_multilingualbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/DrRinS/NER_MultilingualBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ncduy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ncduy_pipeline_en.md new file mode 100644 index 00000000000000..ec286b0989cadf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ncduy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ncduy_pipeline pipeline MarianTransformer from ncduy +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ncduy_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ncduy_pipeline` is a English model originally trained by ncduy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ncduy_pipeline_en_5.5.0_3.0_1725795118472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ncduy_pipeline_en_5.5.0_3.0_1725795118472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ncduy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ncduy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ncduy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.1 MB| + +## References + +https://huggingface.co/ncduy/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-product_review_information_density_detection_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-product_review_information_density_detection_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..403fc30663bd94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-product_review_information_density_detection_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English product_review_information_density_detection_distilbert_pipeline pipeline DistilBertForSequenceClassification from aloychow +author: John Snow Labs +name: product_review_information_density_detection_distilbert_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`product_review_information_density_detection_distilbert_pipeline` is a English model originally trained by aloychow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/product_review_information_density_detection_distilbert_pipeline_en_5.5.0_3.0_1725764662987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/product_review_information_density_detection_distilbert_pipeline_en_5.5.0_3.0_1725764662987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("product_review_information_density_detection_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("product_review_information_density_detection_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|product_review_information_density_detection_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aloychow/product-review-information-density-detection-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-qnli_microsoft_deberta_v3_large_seed_1_en.md b/docs/_posts/ahmedlone127/2024-09-08-qnli_microsoft_deberta_v3_large_seed_1_en.md new file mode 100644 index 00000000000000..6b513416825627 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-qnli_microsoft_deberta_v3_large_seed_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English qnli_microsoft_deberta_v3_large_seed_1 DeBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: qnli_microsoft_deberta_v3_large_seed_1 +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qnli_microsoft_deberta_v3_large_seed_1` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qnli_microsoft_deberta_v3_large_seed_1_en_5.5.0_3.0_1725811376124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qnli_microsoft_deberta_v3_large_seed_1_en_5.5.0_3.0_1725811376124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("qnli_microsoft_deberta_v3_large_seed_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("qnli_microsoft_deberta_v3_large_seed_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qnli_microsoft_deberta_v3_large_seed_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/utahnlp/qnli_microsoft_deberta-v3-large_seed-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-question_classifier_model_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-question_classifier_model_v2_pipeline_en.md new file mode 100644 index 00000000000000..3dca9c74e8f7f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-question_classifier_model_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English question_classifier_model_v2_pipeline pipeline DistilBertForSequenceClassification from sophiaqho +author: John Snow Labs +name: question_classifier_model_v2_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_classifier_model_v2_pipeline` is a English model originally trained by sophiaqho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_classifier_model_v2_pipeline_en_5.5.0_3.0_1725764727893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_classifier_model_v2_pipeline_en_5.5.0_3.0_1725764727893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("question_classifier_model_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("question_classifier_model_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_classifier_model_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sophiaqho/question_classifier_model_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-recipes_trainer_n_sentences_per_recipe_3_sep_true_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-recipes_trainer_n_sentences_per_recipe_3_sep_true_pipeline_en.md new file mode 100644 index 00000000000000..15db3ab30ec03a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-recipes_trainer_n_sentences_per_recipe_3_sep_true_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English recipes_trainer_n_sentences_per_recipe_3_sep_true_pipeline pipeline CamemBertEmbeddings from comartinez +author: John Snow Labs +name: recipes_trainer_n_sentences_per_recipe_3_sep_true_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`recipes_trainer_n_sentences_per_recipe_3_sep_true_pipeline` is a English model originally trained by comartinez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/recipes_trainer_n_sentences_per_recipe_3_sep_true_pipeline_en_5.5.0_3.0_1725836759887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/recipes_trainer_n_sentences_per_recipe_3_sep_true_pipeline_en_5.5.0_3.0_1725836759887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("recipes_trainer_n_sentences_per_recipe_3_sep_true_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("recipes_trainer_n_sentences_per_recipe_3_sep_true_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|recipes_trainer_n_sentences_per_recipe_3_sep_true_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/comartinez/recipes-trainer_n_sentences_per_recipe_3_sep_True + +## Included Models + +- DocumentAssembler +- TokenizerModel +- CamemBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-roberta_base_climate_evidence_related_en.md b/docs/_posts/ahmedlone127/2024-09-08-roberta_base_climate_evidence_related_en.md new file mode 100644 index 00000000000000..438b16d189e169 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-roberta_base_climate_evidence_related_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_climate_evidence_related RoBertaForSequenceClassification from mwong +author: John Snow Labs +name: roberta_base_climate_evidence_related +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_climate_evidence_related` is a English model originally trained by mwong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_climate_evidence_related_en_5.5.0_3.0_1725778481010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_climate_evidence_related_en_5.5.0_3.0_1725778481010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_climate_evidence_related","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_climate_evidence_related", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_climate_evidence_related| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|300.4 MB| + +## References + +https://huggingface.co/mwong/roberta-base-climate-evidence-related \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-roberta_base_squad2_finetuned_squad_katxtong_en.md b/docs/_posts/ahmedlone127/2024-09-08-roberta_base_squad2_finetuned_squad_katxtong_en.md new file mode 100644 index 00000000000000..07a85368115dc1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-roberta_base_squad2_finetuned_squad_katxtong_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_squad2_finetuned_squad_katxtong RoBertaForQuestionAnswering from katxtong +author: John Snow Labs +name: roberta_base_squad2_finetuned_squad_katxtong +date: 2024-09-08 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_squad2_finetuned_squad_katxtong` is a English model originally trained by katxtong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_squad2_finetuned_squad_katxtong_en_5.5.0_3.0_1725833208061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_squad2_finetuned_squad_katxtong_en_5.5.0_3.0_1725833208061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_squad2_finetuned_squad_katxtong","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_squad2_finetuned_squad_katxtong", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_squad2_finetuned_squad_katxtong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.5 MB| + +## References + +https://huggingface.co/katxtong/roberta-base-squad2-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-roberta_finetuned_subjqa_movies_2_ram20307_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-roberta_finetuned_subjqa_movies_2_ram20307_pipeline_en.md new file mode 100644 index 00000000000000..627f13a47c5cd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-roberta_finetuned_subjqa_movies_2_ram20307_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_ram20307_pipeline pipeline RoBertaForQuestionAnswering from Ram20307 +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_ram20307_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_ram20307_pipeline` is a English model originally trained by Ram20307. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_ram20307_pipeline_en_5.5.0_3.0_1725758133096.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_ram20307_pipeline_en_5.5.0_3.0_1725758133096.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_subjqa_movies_2_ram20307_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_subjqa_movies_2_ram20307_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_ram20307_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/Ram20307/roberta-finetuned-subjqa-movies_2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-schemeclassifier_eng_en.md b/docs/_posts/ahmedlone127/2024-09-08-schemeclassifier_eng_en.md new file mode 100644 index 00000000000000..31b76fa15d92f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-schemeclassifier_eng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English schemeclassifier_eng RoBertaForSequenceClassification from raruidol +author: John Snow Labs +name: schemeclassifier_eng +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`schemeclassifier_eng` is a English model originally trained by raruidol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/schemeclassifier_eng_en_5.5.0_3.0_1725830821795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/schemeclassifier_eng_en_5.5.0_3.0_1725830821795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("schemeclassifier_eng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("schemeclassifier_eng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|schemeclassifier_eng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/raruidol/SchemeClassifier-ENG \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-self_harm_bert_en.md b/docs/_posts/ahmedlone127/2024-09-08-self_harm_bert_en.md new file mode 100644 index 00000000000000..f5ec86de2a57ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-self_harm_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English self_harm_bert BertForSequenceClassification from dkuzmenko +author: John Snow Labs +name: self_harm_bert +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`self_harm_bert` is a English model originally trained by dkuzmenko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/self_harm_bert_en_5.5.0_3.0_1725825806901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/self_harm_bert_en_5.5.0_3.0_1725825806901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("self_harm_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("self_harm_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|self_harm_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|627.7 MB| + +## References + +https://huggingface.co/dkuzmenko/self-harm-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-sent_xlm_roberta_base_finetuned_burmese_dear_watson2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-sent_xlm_roberta_base_finetuned_burmese_dear_watson2_pipeline_en.md new file mode 100644 index 00000000000000..c852196f8e6140 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-sent_xlm_roberta_base_finetuned_burmese_dear_watson2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_xlm_roberta_base_finetuned_burmese_dear_watson2_pipeline pipeline XlmRoBertaSentenceEmbeddings from SmartPy +author: John Snow Labs +name: sent_xlm_roberta_base_finetuned_burmese_dear_watson2_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_xlm_roberta_base_finetuned_burmese_dear_watson2_pipeline` is a English model originally trained by SmartPy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_xlm_roberta_base_finetuned_burmese_dear_watson2_pipeline_en_5.5.0_3.0_1725813365707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_xlm_roberta_base_finetuned_burmese_dear_watson2_pipeline_en_5.5.0_3.0_1725813365707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_xlm_roberta_base_finetuned_burmese_dear_watson2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_xlm_roberta_base_finetuned_burmese_dear_watson2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_xlm_roberta_base_finetuned_burmese_dear_watson2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/SmartPy/xlm-roberta-base-finetuned-my_dear_watson2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-somd_xlm_3stage_stage1_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-somd_xlm_3stage_stage1_v1_pipeline_en.md new file mode 100644 index 00000000000000..355f160669d579 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-somd_xlm_3stage_stage1_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English somd_xlm_3stage_stage1_v1_pipeline pipeline XlmRoBertaForTokenClassification from ThuyNT03 +author: John Snow Labs +name: somd_xlm_3stage_stage1_v1_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`somd_xlm_3stage_stage1_v1_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/somd_xlm_3stage_stage1_v1_pipeline_en_5.5.0_3.0_1725806785812.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/somd_xlm_3stage_stage1_v1_pipeline_en_5.5.0_3.0_1725806785812.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("somd_xlm_3stage_stage1_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("somd_xlm_3stage_stage1_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|somd_xlm_3stage_stage1_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|779.7 MB| + +## References + +https://huggingface.co/ThuyNT03/SOMD-xlm-3stage-stage1-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-spark_name_armenian_tonga_tonga_islands_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-spark_name_armenian_tonga_tonga_islands_english_pipeline_en.md new file mode 100644 index 00000000000000..58eff253f48adc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-spark_name_armenian_tonga_tonga_islands_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spark_name_armenian_tonga_tonga_islands_english_pipeline pipeline MarianTransformer from ihebaker10 +author: John Snow Labs +name: spark_name_armenian_tonga_tonga_islands_english_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spark_name_armenian_tonga_tonga_islands_english_pipeline` is a English model originally trained by ihebaker10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spark_name_armenian_tonga_tonga_islands_english_pipeline_en_5.5.0_3.0_1725825000733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spark_name_armenian_tonga_tonga_islands_english_pipeline_en_5.5.0_3.0_1725825000733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spark_name_armenian_tonga_tonga_islands_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spark_name_armenian_tonga_tonga_islands_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spark_name_armenian_tonga_tonga_islands_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|524.1 MB| + +## References + +https://huggingface.co/ihebaker10/spark-name-hy-to-en + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-stego_classifier_checkpoint_epoch_50_2024_07_26_14_26_52_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-stego_classifier_checkpoint_epoch_50_2024_07_26_14_26_52_pipeline_en.md new file mode 100644 index 00000000000000..e9bffec21f1658 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-stego_classifier_checkpoint_epoch_50_2024_07_26_14_26_52_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_50_2024_07_26_14_26_52_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_50_2024_07_26_14_26_52_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_50_2024_07_26_14_26_52_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_50_2024_07_26_14_26_52_pipeline_en_5.5.0_3.0_1725775576005.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_50_2024_07_26_14_26_52_pipeline_en_5.5.0_3.0_1725775576005.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_50_2024_07_26_14_26_52_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_50_2024_07_26_14_26_52_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_50_2024_07_26_14_26_52_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-50-2024-07-26_14-26-52 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-test_squad_xuyilx_en.md b/docs/_posts/ahmedlone127/2024-09-08-test_squad_xuyilx_en.md new file mode 100644 index 00000000000000..8a7a39b28f926c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-test_squad_xuyilx_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English test_squad_xuyilx DistilBertForQuestionAnswering from XuYiLX +author: John Snow Labs +name: test_squad_xuyilx +date: 2024-09-08 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_squad_xuyilx` is a English model originally trained by XuYiLX. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_squad_xuyilx_en_5.5.0_3.0_1725823305510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_squad_xuyilx_en_5.5.0_3.0_1725823305510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("test_squad_xuyilx","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("test_squad_xuyilx", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_squad_xuyilx| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/XuYiLX/test-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_and_normalized_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_and_normalized_pipeline_en.md new file mode 100644 index 00000000000000..17aff1bfd71a7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_and_normalized_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_and_normalized_pipeline pipeline XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_and_normalized_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_and_normalized_pipeline` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_and_normalized_pipeline_en_5.5.0_3.0_1725784741358.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_and_normalized_pipeline_en_5.5.0_3.0_1725784741358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_and_normalized_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_and_normalized_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_and_normalized_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|859.6 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-earnings21-non-normalized-and-normalized + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-tweetcat_en.md b/docs/_posts/ahmedlone127/2024-09-08-tweetcat_en.md new file mode 100644 index 00000000000000..8ef58249d1da6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-tweetcat_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tweetcat RoBertaForSequenceClassification from ivanresh +author: John Snow Labs +name: tweetcat +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tweetcat` is a English model originally trained by ivanresh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tweetcat_en_5.5.0_3.0_1725778615206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tweetcat_en_5.5.0_3.0_1725778615206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("tweetcat","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("tweetcat", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tweetcat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|425.7 MB| + +## References + +https://huggingface.co/ivanresh/TweetCat \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-vaccine_tweet_sentiments_analysis_model_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-vaccine_tweet_sentiments_analysis_model_2_pipeline_en.md new file mode 100644 index 00000000000000..9ec7c81e97a241 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-vaccine_tweet_sentiments_analysis_model_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English vaccine_tweet_sentiments_analysis_model_2_pipeline pipeline RoBertaForSequenceClassification from Ausbel +author: John Snow Labs +name: vaccine_tweet_sentiments_analysis_model_2_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`vaccine_tweet_sentiments_analysis_model_2_pipeline` is a English model originally trained by Ausbel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vaccine_tweet_sentiments_analysis_model_2_pipeline_en_5.5.0_3.0_1725829681996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/vaccine_tweet_sentiments_analysis_model_2_pipeline_en_5.5.0_3.0_1725829681996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("vaccine_tweet_sentiments_analysis_model_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("vaccine_tweet_sentiments_analysis_model_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|vaccine_tweet_sentiments_analysis_model_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/Ausbel/Vaccine-tweet-sentiments-analysis-model-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-xlm_roberta_base_finetuned_panx_all_penguinman73_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-xlm_roberta_base_finetuned_panx_all_penguinman73_pipeline_en.md new file mode 100644 index 00000000000000..fa6c6ac67320f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-xlm_roberta_base_finetuned_panx_all_penguinman73_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_penguinman73_pipeline pipeline XlmRoBertaForTokenClassification from penguinman73 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_penguinman73_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_penguinman73_pipeline` is a English model originally trained by penguinman73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_penguinman73_pipeline_en_5.5.0_3.0_1725807464700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_penguinman73_pipeline_en_5.5.0_3.0_1725807464700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_penguinman73_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_penguinman73_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_penguinman73_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/penguinman73/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-xlm_roberta_base_finetuned_panx_french_jjglilleberg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-xlm_roberta_base_finetuned_panx_french_jjglilleberg_pipeline_en.md new file mode 100644 index 00000000000000..fa2c8a0743b91f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-xlm_roberta_base_finetuned_panx_french_jjglilleberg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jjglilleberg_pipeline pipeline XlmRoBertaForTokenClassification from jjglilleberg +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jjglilleberg_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jjglilleberg_pipeline` is a English model originally trained by jjglilleberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jjglilleberg_pipeline_en_5.5.0_3.0_1725805915302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jjglilleberg_pipeline_en_5.5.0_3.0_1725805915302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jjglilleberg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jjglilleberg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jjglilleberg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/jjglilleberg/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-xlm_roberta_base_finetuned_panx_german_french_huangjia_en.md b/docs/_posts/ahmedlone127/2024-09-08-xlm_roberta_base_finetuned_panx_german_french_huangjia_en.md new file mode 100644 index 00000000000000..131a3c49daa786 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-xlm_roberta_base_finetuned_panx_german_french_huangjia_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_huangjia XlmRoBertaForTokenClassification from huangjia +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_huangjia +date: 2024-09-08 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_huangjia` is a English model originally trained by huangjia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_huangjia_en_5.5.0_3.0_1725784879881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_huangjia_en_5.5.0_3.0_1725784879881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_huangjia","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_huangjia", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_huangjia| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|857.0 MB| + +## References + +https://huggingface.co/huangjia/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-xlm_roberta_base_finetuned_panx_italian_mj03_en.md b/docs/_posts/ahmedlone127/2024-09-08-xlm_roberta_base_finetuned_panx_italian_mj03_en.md new file mode 100644 index 00000000000000..dfc03a2605eff1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-xlm_roberta_base_finetuned_panx_italian_mj03_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_mj03 XlmRoBertaForTokenClassification from MJ03 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_mj03 +date: 2024-09-08 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_mj03` is a English model originally trained by MJ03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_mj03_en_5.5.0_3.0_1725785504359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_mj03_en_5.5.0_3.0_1725785504359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_mj03","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_mj03", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_mj03| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/MJ03/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-xlmroberta_ner_pglauner_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-08-xlmroberta_ner_pglauner_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..4482d03df3256b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-xlmroberta_ner_pglauner_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_pglauner_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from pglauner +author: John Snow Labs +name: xlmroberta_ner_pglauner_base_finetuned_panx_pipeline +date: 2024-09-08 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_pglauner_base_finetuned_panx_pipeline` is a German model originally trained by pglauner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_pglauner_base_finetuned_panx_pipeline_de_5.5.0_3.0_1725807663141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_pglauner_base_finetuned_panx_pipeline_de_5.5.0_3.0_1725807663141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_pglauner_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_pglauner_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_pglauner_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/pglauner/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-your_model_name_en.md b/docs/_posts/ahmedlone127/2024-09-08-your_model_name_en.md new file mode 100644 index 00000000000000..4f7c062c8ee6b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-your_model_name_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English your_model_name DistilBertEmbeddings from SangwooJ +author: John Snow Labs +name: your_model_name +date: 2024-09-08 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`your_model_name` is a English model originally trained by SangwooJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/your_model_name_en_5.5.0_3.0_1725828688266.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/your_model_name_en_5.5.0_3.0_1725828688266.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("your_model_name","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("your_model_name","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|your_model_name| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/SangwooJ/your-model-name \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-2020_q1_90p_filtered_en.md b/docs/_posts/ahmedlone127/2024-09-09-2020_q1_90p_filtered_en.md new file mode 100644 index 00000000000000..19534ff8bd9906 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-2020_q1_90p_filtered_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q1_90p_filtered RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_90p_filtered +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_90p_filtered` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_90p_filtered_en_5.5.0_3.0_1725925433205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_90p_filtered_en_5.5.0_3.0_1725925433205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q1_90p_filtered","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q1_90p_filtered","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_90p_filtered| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-90p-filtered \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-albert_persian_farsi_base_v2_clf_digimag_fa.md b/docs/_posts/ahmedlone127/2024-09-09-albert_persian_farsi_base_v2_clf_digimag_fa.md new file mode 100644 index 00000000000000..50417936227be0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-albert_persian_farsi_base_v2_clf_digimag_fa.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Persian albert_persian_farsi_base_v2_clf_digimag AlbertForSequenceClassification from m3hrdadfi +author: John Snow Labs +name: albert_persian_farsi_base_v2_clf_digimag +date: 2024-09-09 +tags: [fa, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_persian_farsi_base_v2_clf_digimag` is a Persian model originally trained by m3hrdadfi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_persian_farsi_base_v2_clf_digimag_fa_5.5.0_3.0_1725889138230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_persian_farsi_base_v2_clf_digimag_fa_5.5.0_3.0_1725889138230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_persian_farsi_base_v2_clf_digimag","fa") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_persian_farsi_base_v2_clf_digimag", "fa") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_persian_farsi_base_v2_clf_digimag| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|fa| +|Size:|68.6 MB| + +## References + +https://huggingface.co/m3hrdadfi/albert-fa-base-v2-clf-digimag \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-all_mpnet_base_v2_bioasq_1epoc_batch32_100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-all_mpnet_base_v2_bioasq_1epoc_batch32_100_pipeline_en.md new file mode 100644 index 00000000000000..81e09a83a7c903 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-all_mpnet_base_v2_bioasq_1epoc_batch32_100_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English all_mpnet_base_v2_bioasq_1epoc_batch32_100_pipeline pipeline MPNetEmbeddings from juanpablomesa +author: John Snow Labs +name: all_mpnet_base_v2_bioasq_1epoc_batch32_100_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_base_v2_bioasq_1epoc_batch32_100_pipeline` is a English model originally trained by juanpablomesa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_bioasq_1epoc_batch32_100_pipeline_en_5.5.0_3.0_1725897153640.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_bioasq_1epoc_batch32_100_pipeline_en_5.5.0_3.0_1725897153640.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_mpnet_base_v2_bioasq_1epoc_batch32_100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_mpnet_base_v2_bioasq_1epoc_batch32_100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_base_v2_bioasq_1epoc_batch32_100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/juanpablomesa/all-mpnet-base-v2-bioasq-1epoc-batch32-100 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-all_mpnet_base_v2_celanese_test_nepal_bhasa_en.md b/docs/_posts/ahmedlone127/2024-09-09-all_mpnet_base_v2_celanese_test_nepal_bhasa_en.md new file mode 100644 index 00000000000000..c7dbf6580c0492 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-all_mpnet_base_v2_celanese_test_nepal_bhasa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English all_mpnet_base_v2_celanese_test_nepal_bhasa MPNetEmbeddings from testCelUR +author: John Snow Labs +name: all_mpnet_base_v2_celanese_test_nepal_bhasa +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_base_v2_celanese_test_nepal_bhasa` is a English model originally trained by testCelUR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_celanese_test_nepal_bhasa_en_5.5.0_3.0_1725874707366.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_celanese_test_nepal_bhasa_en_5.5.0_3.0_1725874707366.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("all_mpnet_base_v2_celanese_test_nepal_bhasa","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("all_mpnet_base_v2_celanese_test_nepal_bhasa","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_base_v2_celanese_test_nepal_bhasa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/testCelUR/all-mpnet-base-v2-celanese_test_new \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-angela_diacritics_untranslated_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-angela_diacritics_untranslated_eval_pipeline_en.md new file mode 100644 index 00000000000000..0a4d6cae9424c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-angela_diacritics_untranslated_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_diacritics_untranslated_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_diacritics_untranslated_eval_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_diacritics_untranslated_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_diacritics_untranslated_eval_pipeline_en_5.5.0_3.0_1725922741832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_diacritics_untranslated_eval_pipeline_en_5.5.0_3.0_1725922741832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_diacritics_untranslated_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_diacritics_untranslated_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_diacritics_untranslated_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_diacritics_untranslated_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-arabic2_en.md b/docs/_posts/ahmedlone127/2024-09-09-arabic2_en.md new file mode 100644 index 00000000000000..71bc08acedc971 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-arabic2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English arabic2 MarianTransformer from PontifexMaximus +author: John Snow Labs +name: arabic2 +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabic2` is a English model originally trained by PontifexMaximus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabic2_en_5.5.0_3.0_1725891681715.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabic2_en_5.5.0_3.0_1725891681715.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("arabic2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("arabic2","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabic2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|527.9 MB| + +## References + +https://huggingface.co/PontifexMaximus/Arabic2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-bert_squad_sabrinah_en.md b/docs/_posts/ahmedlone127/2024-09-09-bert_squad_sabrinah_en.md new file mode 100644 index 00000000000000..eac08384dceb7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-bert_squad_sabrinah_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_squad_sabrinah DistilBertForQuestionAnswering from sabrinah +author: John Snow Labs +name: bert_squad_sabrinah +date: 2024-09-09 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_squad_sabrinah` is a English model originally trained by sabrinah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_squad_sabrinah_en_5.5.0_3.0_1725892391147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_squad_sabrinah_en_5.5.0_3.0_1725892391147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("bert_squad_sabrinah","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("bert_squad_sabrinah", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_squad_sabrinah| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/sabrinah/BERT-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-best_model_yelp_polarity_64_42_en.md b/docs/_posts/ahmedlone127/2024-09-09-best_model_yelp_polarity_64_42_en.md new file mode 100644 index 00000000000000..a46602caabe06a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-best_model_yelp_polarity_64_42_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English best_model_yelp_polarity_64_42 AlbertForSequenceClassification from simonycl +author: John Snow Labs +name: best_model_yelp_polarity_64_42 +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`best_model_yelp_polarity_64_42` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/best_model_yelp_polarity_64_42_en_5.5.0_3.0_1725924138729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/best_model_yelp_polarity_64_42_en_5.5.0_3.0_1725924138729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("best_model_yelp_polarity_64_42","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("best_model_yelp_polarity_64_42", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|best_model_yelp_polarity_64_42| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/simonycl/best_model-yelp_polarity-64-42 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-bge_base_financial_matryoshka_dpokhrel_en.md b/docs/_posts/ahmedlone127/2024-09-09-bge_base_financial_matryoshka_dpokhrel_en.md new file mode 100644 index 00000000000000..4de9b9ff84215e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-bge_base_financial_matryoshka_dpokhrel_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_dpokhrel BGEEmbeddings from dpokhrel +author: John Snow Labs +name: bge_base_financial_matryoshka_dpokhrel +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_dpokhrel` is a English model originally trained by dpokhrel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_dpokhrel_en_5.5.0_3.0_1725916736517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_dpokhrel_en_5.5.0_3.0_1725916736517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_dpokhrel","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_dpokhrel","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_dpokhrel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.2 MB| + +## References + +https://huggingface.co/dpokhrel/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-bislama_all_bs160_allneg_finetuned_webnlg2020_data_coverage_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-bislama_all_bs160_allneg_finetuned_webnlg2020_data_coverage_pipeline_en.md new file mode 100644 index 00000000000000..575e890d05dc00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-bislama_all_bs160_allneg_finetuned_webnlg2020_data_coverage_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bislama_all_bs160_allneg_finetuned_webnlg2020_data_coverage_pipeline pipeline MPNetEmbeddings from teven +author: John Snow Labs +name: bislama_all_bs160_allneg_finetuned_webnlg2020_data_coverage_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bislama_all_bs160_allneg_finetuned_webnlg2020_data_coverage_pipeline` is a English model originally trained by teven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bislama_all_bs160_allneg_finetuned_webnlg2020_data_coverage_pipeline_en_5.5.0_3.0_1725874553223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bislama_all_bs160_allneg_finetuned_webnlg2020_data_coverage_pipeline_en_5.5.0_3.0_1725874553223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bislama_all_bs160_allneg_finetuned_webnlg2020_data_coverage_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bislama_all_bs160_allneg_finetuned_webnlg2020_data_coverage_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bislama_all_bs160_allneg_finetuned_webnlg2020_data_coverage_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/teven/bi_all_bs160_allneg_finetuned_WebNLG2020_data_coverage + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-bool_en.md b/docs/_posts/ahmedlone127/2024-09-09-bool_en.md new file mode 100644 index 00000000000000..09655b51d63624 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-bool_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bool BertForSequenceClassification from Mahmoud3899 +author: John Snow Labs +name: bool +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bool` is a English model originally trained by Mahmoud3899. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bool_en_5.5.0_3.0_1725900571854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bool_en_5.5.0_3.0_1725900571854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bool","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bool", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bool| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/Mahmoud3899/bool \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_aiyaz_en.md b/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_aiyaz_en.md new file mode 100644 index 00000000000000..f7ab4e0e80a3b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_aiyaz_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_aiyaz DistilBertForQuestionAnswering from aiyaz +author: John Snow Labs +name: burmese_awesome_qa_model_aiyaz +date: 2024-09-09 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_aiyaz` is a English model originally trained by aiyaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_aiyaz_en_5.5.0_3.0_1725892518504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_aiyaz_en_5.5.0_3.0_1725892518504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_aiyaz","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_aiyaz", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_aiyaz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/aiyaz/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_ashch_en.md b/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_ashch_en.md new file mode 100644 index 00000000000000..18342ed73a24e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_ashch_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_ashch DistilBertForQuestionAnswering from ashch +author: John Snow Labs +name: burmese_awesome_qa_model_ashch +date: 2024-09-09 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_ashch` is a English model originally trained by ashch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ashch_en_5.5.0_3.0_1725869359003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ashch_en_5.5.0_3.0_1725869359003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_ashch","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_ashch", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_ashch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/ashch/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_vhung_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_vhung_pipeline_en.md new file mode 100644 index 00000000000000..360616ceaf5b64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_vhung_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_vhung_pipeline pipeline DistilBertForQuestionAnswering from VHung +author: John Snow Labs +name: burmese_awesome_qa_model_vhung_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_vhung_pipeline` is a English model originally trained by VHung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_vhung_pipeline_en_5.5.0_3.0_1725892741508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_vhung_pipeline_en_5.5.0_3.0_1725892741508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_vhung_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_vhung_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_vhung_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/VHung/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-chai_deberta_v3_base_reward_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-chai_deberta_v3_base_reward_model_pipeline_en.md new file mode 100644 index 00000000000000..50bd40e58dbfe4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-chai_deberta_v3_base_reward_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English chai_deberta_v3_base_reward_model_pipeline pipeline DeBertaForSequenceClassification from decem +author: John Snow Labs +name: chai_deberta_v3_base_reward_model_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chai_deberta_v3_base_reward_model_pipeline` is a English model originally trained by decem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chai_deberta_v3_base_reward_model_pipeline_en_5.5.0_3.0_1725859853854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chai_deberta_v3_base_reward_model_pipeline_en_5.5.0_3.0_1725859853854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chai_deberta_v3_base_reward_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chai_deberta_v3_base_reward_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chai_deberta_v3_base_reward_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|634.4 MB| + +## References + +https://huggingface.co/decem/chai-deberta-v3-base-reward-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-chai_reward_deberta_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-09-chai_reward_deberta_classifier_en.md new file mode 100644 index 00000000000000..83d0be50e4e34a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-chai_reward_deberta_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English chai_reward_deberta_classifier DeBertaForSequenceClassification from hflserdaniel +author: John Snow Labs +name: chai_reward_deberta_classifier +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chai_reward_deberta_classifier` is a English model originally trained by hflserdaniel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chai_reward_deberta_classifier_en_5.5.0_3.0_1725849716737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chai_reward_deberta_classifier_en_5.5.0_3.0_1725849716737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("chai_reward_deberta_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("chai_reward_deberta_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chai_reward_deberta_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|634.2 MB| + +## References + +https://huggingface.co/hflserdaniel/chai_reward_deberta_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-coha1850s_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-coha1850s_pipeline_en.md new file mode 100644 index 00000000000000..63a2424ccc197a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-coha1850s_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English coha1850s_pipeline pipeline RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1850s_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1850s_pipeline` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1850s_pipeline_en_5.5.0_3.0_1725926007193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1850s_pipeline_en_5.5.0_3.0_1725926007193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("coha1850s_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("coha1850s_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1850s_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|312.2 MB| + +## References + +https://huggingface.co/simonmun/COHA1850s + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-cold_fusion_itr12_seed2_en.md b/docs/_posts/ahmedlone127/2024-09-09-cold_fusion_itr12_seed2_en.md new file mode 100644 index 00000000000000..558e70ed0e915b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-cold_fusion_itr12_seed2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr12_seed2 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr12_seed2 +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr12_seed2` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr12_seed2_en_5.5.0_3.0_1725920339934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr12_seed2_en_5.5.0_3.0_1725920339934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr12_seed2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr12_seed2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr12_seed2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.9 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr12-seed2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-cold_fusion_itr12_seed2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-cold_fusion_itr12_seed2_pipeline_en.md new file mode 100644 index 00000000000000..4859a5ff7703b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-cold_fusion_itr12_seed2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr12_seed2_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr12_seed2_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr12_seed2_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr12_seed2_pipeline_en_5.5.0_3.0_1725920363445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr12_seed2_pipeline_en_5.5.0_3.0_1725920363445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr12_seed2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr12_seed2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr12_seed2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.9 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr12-seed2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-deberta_disaster_tweet_recognizer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-deberta_disaster_tweet_recognizer_pipeline_en.md new file mode 100644 index 00000000000000..e4bb0034459300 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-deberta_disaster_tweet_recognizer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_disaster_tweet_recognizer_pipeline pipeline DeBertaForSequenceClassification from onegoodlad +author: John Snow Labs +name: deberta_disaster_tweet_recognizer_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_disaster_tweet_recognizer_pipeline` is a English model originally trained by onegoodlad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_disaster_tweet_recognizer_pipeline_en_5.5.0_3.0_1725859594367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_disaster_tweet_recognizer_pipeline_en_5.5.0_3.0_1725859594367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_disaster_tweet_recognizer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_disaster_tweet_recognizer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_disaster_tweet_recognizer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/onegoodlad/deberta-disaster-tweet-recognizer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-deberta_v3_base_4bit_64rank_backbone_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-deberta_v3_base_4bit_64rank_backbone_pipeline_en.md new file mode 100644 index 00000000000000..0b08dbf29fb88d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-deberta_v3_base_4bit_64rank_backbone_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_base_4bit_64rank_backbone_pipeline pipeline DeBertaForSequenceClassification from yxli2123 +author: John Snow Labs +name: deberta_v3_base_4bit_64rank_backbone_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_base_4bit_64rank_backbone_pipeline` is a English model originally trained by yxli2123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_base_4bit_64rank_backbone_pipeline_en_5.5.0_3.0_1725859611807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_base_4bit_64rank_backbone_pipeline_en_5.5.0_3.0_1725859611807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_base_4bit_64rank_backbone_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_base_4bit_64rank_backbone_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_base_4bit_64rank_backbone_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|689.9 MB| + +## References + +https://huggingface.co/yxli2123/deberta-v3-base-4bit-64rank-backbone + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-debertaemotionbalanced_en.md b/docs/_posts/ahmedlone127/2024-09-09-debertaemotionbalanced_en.md new file mode 100644 index 00000000000000..f44aff5200a710 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-debertaemotionbalanced_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English debertaemotionbalanced DeBertaForSequenceClassification from aliciiavs +author: John Snow Labs +name: debertaemotionbalanced +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`debertaemotionbalanced` is a English model originally trained by aliciiavs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/debertaemotionbalanced_en_5.5.0_3.0_1725859749248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/debertaemotionbalanced_en_5.5.0_3.0_1725859749248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("debertaemotionbalanced","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("debertaemotionbalanced", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|debertaemotionbalanced| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|433.7 MB| + +## References + +https://huggingface.co/aliciiavs/debertaemotionbalanced \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_akabot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_akabot_pipeline_en.md new file mode 100644 index 00000000000000..8495ed662b7cf3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_akabot_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_akabot_pipeline pipeline DistilBertForQuestionAnswering from avisena +author: John Snow Labs +name: distilbert_akabot_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_akabot_pipeline` is a English model originally trained by avisena. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_akabot_pipeline_en_5.5.0_3.0_1725869128782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_akabot_pipeline_en_5.5.0_3.0_1725869128782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_akabot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_akabot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_akabot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/avisena/distilbert_akabot + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_emotion_saneryi_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_emotion_saneryi_en.md new file mode 100644 index 00000000000000..50f3950d401ddb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_emotion_saneryi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_saneryi DistilBertForSequenceClassification from Saneryi +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_saneryi +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_saneryi` is a English model originally trained by Saneryi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_saneryi_en_5.5.0_3.0_1725873455455.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_saneryi_en_5.5.0_3.0_1725873455455.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_saneryi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_saneryi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_saneryi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Saneryi/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_accelerate_lifan_z_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_accelerate_lifan_z_en.md new file mode 100644 index 00000000000000..ff844098070c37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_accelerate_lifan_z_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_accelerate_lifan_z DistilBertEmbeddings from Lifan-Z +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_accelerate_lifan_z +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_accelerate_lifan_z` is a English model originally trained by Lifan-Z. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_accelerate_lifan_z_en_5.5.0_3.0_1725868204370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_accelerate_lifan_z_en_5.5.0_3.0_1725868204370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_imdb_accelerate_lifan_z","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_imdb_accelerate_lifan_z","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_accelerate_lifan_z| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Lifan-Z/distilbert-base-uncased-finetuned-imdb-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_bluspark_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_bluspark_pipeline_en.md new file mode 100644 index 00000000000000..4fc3b1ba3890f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_bluspark_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_bluspark_pipeline pipeline DistilBertEmbeddings from bluspark +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_bluspark_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_bluspark_pipeline` is a English model originally trained by bluspark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_bluspark_pipeline_en_5.5.0_3.0_1725878232253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_bluspark_pipeline_en_5.5.0_3.0_1725878232253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_bluspark_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_bluspark_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_bluspark_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/bluspark/distilbert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_hachiiiii_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_hachiiiii_pipeline_en.md new file mode 100644 index 00000000000000..21076d232e1b73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_hachiiiii_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_hachiiiii_pipeline pipeline DistilBertEmbeddings from hachiiiii +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_hachiiiii_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_hachiiiii_pipeline` is a English model originally trained by hachiiiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_hachiiiii_pipeline_en_5.5.0_3.0_1725921712003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_hachiiiii_pipeline_en_5.5.0_3.0_1725921712003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_hachiiiii_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_hachiiiii_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_hachiiiii_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/hachiiiii/distilbert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_ramification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_ramification_pipeline_en.md new file mode 100644 index 00000000000000..2d0acfe7957024 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_ramification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_ramification_pipeline pipeline DistilBertEmbeddings from ramification +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_ramification_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_ramification_pipeline` is a English model originally trained by ramification. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_ramification_pipeline_en_5.5.0_3.0_1725909455943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_ramification_pipeline_en_5.5.0_3.0_1725909455943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_ramification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_ramification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_ramification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/ramification/distilbert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_vonewman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_vonewman_pipeline_en.md new file mode 100644 index 00000000000000..7abeae538c4629 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_vonewman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_vonewman_pipeline pipeline DistilBertEmbeddings from vonewman +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_vonewman_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_vonewman_pipeline` is a English model originally trained by vonewman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_vonewman_pipeline_en_5.5.0_3.0_1725905879833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_vonewman_pipeline_en_5.5.0_3.0_1725905879833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_vonewman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_vonewman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_vonewman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/vonewman/distilbert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_ner_sudheer55_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_ner_sudheer55_en.md new file mode 100644 index 00000000000000..24ff827df52fa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_ner_sudheer55_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_ner_sudheer55 DistilBertForTokenClassification from Sudheer55 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_ner_sudheer55 +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_ner_sudheer55` is a English model originally trained by Sudheer55. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_ner_sudheer55_en_5.5.0_3.0_1725889773204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_ner_sudheer55_en_5.5.0_3.0_1725889773204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_finetuned_ner_sudheer55","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_finetuned_ner_sudheer55", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_ner_sudheer55| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Sudheer55/distilbert-base-uncased-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_azyren_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_azyren_en.md new file mode 100644 index 00000000000000..5924a0711e79ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_azyren_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_azyren DistilBertForQuestionAnswering from Azyren +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_azyren +date: 2024-09-09 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_azyren` is a English model originally trained by Azyren. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_azyren_en_5.5.0_3.0_1725877146469.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_azyren_en_5.5.0_3.0_1725877146469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_azyren","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_azyren", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_azyren| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Azyren/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_d5716d28_iotengtr_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_d5716d28_iotengtr_en.md new file mode 100644 index 00000000000000..6ace7d3ed5532b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_d5716d28_iotengtr_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_iotengtr DistilBertEmbeddings from iotengtr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_iotengtr +date: 2024-09-09 +tags: [distilbert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_iotengtr` is a English model originally trained by iotengtr. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_iotengtr_en_5.5.0_3.0_1725892828216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_iotengtr_en_5.5.0_3.0_1725892828216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_iotengtr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = DistilBertEmbeddings + .pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_iotengtr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_iotengtr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +References + +https://huggingface.co/iotengtr/distilbert-base-uncased-finetuned-squad-d5716d28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_nyh8570_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_nyh8570_en.md new file mode 100644 index 00000000000000..b7b5e0cd755c88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_nyh8570_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_nyh8570 DistilBertForQuestionAnswering from nyh8570 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_nyh8570 +date: 2024-09-09 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_nyh8570` is a English model originally trained by nyh8570. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_nyh8570_en_5.5.0_3.0_1725869120961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_nyh8570_en_5.5.0_3.0_1725869120961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_nyh8570","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_nyh8570", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_nyh8570| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/nyh8570/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_shahwali_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_shahwali_en.md new file mode 100644 index 00000000000000..56ee37ba744d95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_shahwali_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_shahwali DistilBertForQuestionAnswering from ShahWali +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_shahwali +date: 2024-09-09 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_shahwali` is a English model originally trained by ShahWali. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_shahwali_en_5.5.0_3.0_1725892713689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_shahwali_en_5.5.0_3.0_1725892713689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_shahwali","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_shahwali", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_shahwali| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/ShahWali/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_sanskrit_saskta_pre_training_complete_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_sanskrit_saskta_pre_training_complete_en.md new file mode 100644 index 00000000000000..4a52a270e1e2c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_sanskrit_saskta_pre_training_complete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_pre_training_complete DistilBertEmbeddings from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_pre_training_complete +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_pre_training_complete` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_pre_training_complete_en_5.5.0_3.0_1725905895436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_pre_training_complete_en_5.5.0_3.0_1725905895436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("distilbert_sanskrit_saskta_pre_training_complete","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("distilbert_sanskrit_saskta_pre_training_complete","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_pre_training_complete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_pre-training-complete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_uncased_names_accelerate_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_uncased_names_accelerate_en.md new file mode 100644 index 00000000000000..534ef6ee00b4a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_uncased_names_accelerate_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_uncased_names_accelerate DistilBertForTokenClassification from jackfriedson +author: John Snow Labs +name: distilbert_uncased_names_accelerate +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_uncased_names_accelerate` is a English model originally trained by jackfriedson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_uncased_names_accelerate_en_5.5.0_3.0_1725889970584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_uncased_names_accelerate_en_5.5.0_3.0_1725889970584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_uncased_names_accelerate","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_uncased_names_accelerate", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_uncased_names_accelerate| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/jackfriedson/distilbert-uncased-names-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-electra_embeddings_delectra_generator_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-09-electra_embeddings_delectra_generator_pipeline_ko.md new file mode 100644 index 00000000000000..05de7826a506a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-electra_embeddings_delectra_generator_pipeline_ko.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Korean electra_embeddings_delectra_generator_pipeline pipeline BertEmbeddings from deeq +author: John Snow Labs +name: electra_embeddings_delectra_generator_pipeline +date: 2024-09-09 +tags: [ko, open_source, pipeline, onnx] +task: Embeddings +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`electra_embeddings_delectra_generator_pipeline` is a Korean model originally trained by deeq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/electra_embeddings_delectra_generator_pipeline_ko_5.5.0_3.0_1725882284679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/electra_embeddings_delectra_generator_pipeline_ko_5.5.0_3.0_1725882284679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("electra_embeddings_delectra_generator_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("electra_embeddings_delectra_generator_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|electra_embeddings_delectra_generator_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|138.7 MB| + +## References + +https://huggingface.co/deeq/delectra-generator + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-english_coptic_multiple_languages_norm_group_greekified_en.md b/docs/_posts/ahmedlone127/2024-09-09-english_coptic_multiple_languages_norm_group_greekified_en.md new file mode 100644 index 00000000000000..0fdc52200f0073 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-english_coptic_multiple_languages_norm_group_greekified_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English english_coptic_multiple_languages_norm_group_greekified MarianTransformer from megalaa +author: John Snow Labs +name: english_coptic_multiple_languages_norm_group_greekified +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`english_coptic_multiple_languages_norm_group_greekified` is a English model originally trained by megalaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/english_coptic_multiple_languages_norm_group_greekified_en_5.5.0_3.0_1725863606635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/english_coptic_multiple_languages_norm_group_greekified_en_5.5.0_3.0_1725863606635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("english_coptic_multiple_languages_norm_group_greekified","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("english_coptic_multiple_languages_norm_group_greekified","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|english_coptic_multiple_languages_norm_group_greekified| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|530.3 MB| + +## References + +https://huggingface.co/megalaa/en-cop-mul-norm-group-greekified \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-english_tonga_tonga_islands_romanian_en.md b/docs/_posts/ahmedlone127/2024-09-09-english_tonga_tonga_islands_romanian_en.md new file mode 100644 index 00000000000000..a09b0550950484 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-english_tonga_tonga_islands_romanian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English english_tonga_tonga_islands_romanian MarianTransformer from sanjeev498 +author: John Snow Labs +name: english_tonga_tonga_islands_romanian +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`english_tonga_tonga_islands_romanian` is a English model originally trained by sanjeev498. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/english_tonga_tonga_islands_romanian_en_5.5.0_3.0_1725864317461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/english_tonga_tonga_islands_romanian_en_5.5.0_3.0_1725864317461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("english_tonga_tonga_islands_romanian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("english_tonga_tonga_islands_romanian","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|english_tonga_tonga_islands_romanian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.5 MB| + +## References + +https://huggingface.co/sanjeev498/en-to-romanian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-esberto_small_en.md b/docs/_posts/ahmedlone127/2024-09-09-esberto_small_en.md new file mode 100644 index 00000000000000..a2bdfd6e8129f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-esberto_small_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English esberto_small RoBertaEmbeddings from sgugger +author: John Snow Labs +name: esberto_small +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`esberto_small` is a English model originally trained by sgugger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/esberto_small_en_5.5.0_3.0_1725925218927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/esberto_small_en_5.5.0_3.0_1725925218927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("esberto_small","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("esberto_small","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|esberto_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.3 MB| + +## References + +https://huggingface.co/sgugger/esberto-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-f_roberta_classifier2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-f_roberta_classifier2_pipeline_en.md new file mode 100644 index 00000000000000..c83479d1ed39d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-f_roberta_classifier2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English f_roberta_classifier2_pipeline pipeline RoBertaForSequenceClassification from James-kc-min +author: John Snow Labs +name: f_roberta_classifier2_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`f_roberta_classifier2_pipeline` is a English model originally trained by James-kc-min. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/f_roberta_classifier2_pipeline_en_5.5.0_3.0_1725911319934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/f_roberta_classifier2_pipeline_en_5.5.0_3.0_1725911319934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("f_roberta_classifier2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("f_roberta_classifier2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|f_roberta_classifier2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/James-kc-min/F_Roberta_classifier2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-facility_maint_albert_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-facility_maint_albert_model_pipeline_en.md new file mode 100644 index 00000000000000..bd2faf0cdbda70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-facility_maint_albert_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English facility_maint_albert_model_pipeline pipeline AlbertForSequenceClassification from Akash24 +author: John Snow Labs +name: facility_maint_albert_model_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`facility_maint_albert_model_pipeline` is a English model originally trained by Akash24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/facility_maint_albert_model_pipeline_en_5.5.0_3.0_1725901323606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/facility_maint_albert_model_pipeline_en_5.5.0_3.0_1725901323606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("facility_maint_albert_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("facility_maint_albert_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|facility_maint_albert_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.3 MB| + +## References + +https://huggingface.co/Akash24/facility_maint_albert_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-finetuned_mixed6_3epochs_en.md b/docs/_posts/ahmedlone127/2024-09-09-finetuned_mixed6_3epochs_en.md new file mode 100644 index 00000000000000..3e1ba03629c390 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-finetuned_mixed6_3epochs_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English finetuned_mixed6_3epochs MPNetEmbeddings from jhsmith +author: John Snow Labs +name: finetuned_mixed6_3epochs +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_mixed6_3epochs` is a English model originally trained by jhsmith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_mixed6_3epochs_en_5.5.0_3.0_1725874208871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_mixed6_3epochs_en_5.5.0_3.0_1725874208871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("finetuned_mixed6_3epochs","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("finetuned_mixed6_3epochs","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_mixed6_3epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/jhsmith/finetuned_mixed6_3epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-hensinki_english_spanish_finetuned_spanish_tonga_tonga_islands_english_tateoba_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-hensinki_english_spanish_finetuned_spanish_tonga_tonga_islands_english_tateoba_pipeline_en.md new file mode 100644 index 00000000000000..815ee9d4c94328 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-hensinki_english_spanish_finetuned_spanish_tonga_tonga_islands_english_tateoba_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hensinki_english_spanish_finetuned_spanish_tonga_tonga_islands_english_tateoba_pipeline pipeline MarianTransformer from beanslmao +author: John Snow Labs +name: hensinki_english_spanish_finetuned_spanish_tonga_tonga_islands_english_tateoba_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hensinki_english_spanish_finetuned_spanish_tonga_tonga_islands_english_tateoba_pipeline` is a English model originally trained by beanslmao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hensinki_english_spanish_finetuned_spanish_tonga_tonga_islands_english_tateoba_pipeline_en_5.5.0_3.0_1725840141124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hensinki_english_spanish_finetuned_spanish_tonga_tonga_islands_english_tateoba_pipeline_en_5.5.0_3.0_1725840141124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hensinki_english_spanish_finetuned_spanish_tonga_tonga_islands_english_tateoba_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hensinki_english_spanish_finetuned_spanish_tonga_tonga_islands_english_tateoba_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hensinki_english_spanish_finetuned_spanish_tonga_tonga_islands_english_tateoba_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|540.3 MB| + +## References + +https://huggingface.co/beanslmao/hensinki-en-es-finetuned-spanish-to-english-tateoba + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-indic_bert_finetuned_legal_try_with_more_data_en.md b/docs/_posts/ahmedlone127/2024-09-09-indic_bert_finetuned_legal_try_with_more_data_en.md new file mode 100644 index 00000000000000..a21e427fae6bf9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-indic_bert_finetuned_legal_try_with_more_data_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English indic_bert_finetuned_legal_try_with_more_data AlbertForSequenceClassification from PoptropicaSahil +author: John Snow Labs +name: indic_bert_finetuned_legal_try_with_more_data +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indic_bert_finetuned_legal_try_with_more_data` is a English model originally trained by PoptropicaSahil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indic_bert_finetuned_legal_try_with_more_data_en_5.5.0_3.0_1725924811759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indic_bert_finetuned_legal_try_with_more_data_en_5.5.0_3.0_1725924811759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("indic_bert_finetuned_legal_try_with_more_data","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("indic_bert_finetuned_legal_try_with_more_data", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indic_bert_finetuned_legal_try_with_more_data| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|127.7 MB| + +## References + +https://huggingface.co/PoptropicaSahil/indic-bert-finetuned-legal_try_with_more_data \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-iwslt17_marian_small_ctx4_cwd4_english_french_en.md b/docs/_posts/ahmedlone127/2024-09-09-iwslt17_marian_small_ctx4_cwd4_english_french_en.md new file mode 100644 index 00000000000000..01e1c52d909859 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-iwslt17_marian_small_ctx4_cwd4_english_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English iwslt17_marian_small_ctx4_cwd4_english_french MarianTransformer from context-mt +author: John Snow Labs +name: iwslt17_marian_small_ctx4_cwd4_english_french +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`iwslt17_marian_small_ctx4_cwd4_english_french` is a English model originally trained by context-mt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/iwslt17_marian_small_ctx4_cwd4_english_french_en_5.5.0_3.0_1725864931202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/iwslt17_marian_small_ctx4_cwd4_english_french_en_5.5.0_3.0_1725864931202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("iwslt17_marian_small_ctx4_cwd4_english_french","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("iwslt17_marian_small_ctx4_cwd4_english_french","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|iwslt17_marian_small_ctx4_cwd4_english_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/context-mt/iwslt17-marian-small-ctx4-cwd4-en-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-jerteh_355_pipeline_sr.md b/docs/_posts/ahmedlone127/2024-09-09-jerteh_355_pipeline_sr.md new file mode 100644 index 00000000000000..95632b932c586b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-jerteh_355_pipeline_sr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Serbian jerteh_355_pipeline pipeline RoBertaEmbeddings from jerteh +author: John Snow Labs +name: jerteh_355_pipeline +date: 2024-09-09 +tags: [sr, open_source, pipeline, onnx] +task: Embeddings +language: sr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh_355_pipeline` is a Serbian model originally trained by jerteh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh_355_pipeline_sr_5.5.0_3.0_1725910826242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh_355_pipeline_sr_5.5.0_3.0_1725910826242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jerteh_355_pipeline", lang = "sr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jerteh_355_pipeline", lang = "sr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh_355_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sr| +|Size:|1.3 GB| + +## References + +https://huggingface.co/jerteh/Jerteh-355 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-keyword_finetuned_embedding_en.md b/docs/_posts/ahmedlone127/2024-09-09-keyword_finetuned_embedding_en.md new file mode 100644 index 00000000000000..51df66e9d823e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-keyword_finetuned_embedding_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English keyword_finetuned_embedding MPNetEmbeddings from jinquan122 +author: John Snow Labs +name: keyword_finetuned_embedding +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`keyword_finetuned_embedding` is a English model originally trained by jinquan122. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/keyword_finetuned_embedding_en_5.5.0_3.0_1725897368504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/keyword_finetuned_embedding_en_5.5.0_3.0_1725897368504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("keyword_finetuned_embedding","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("keyword_finetuned_embedding","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|keyword_finetuned_embedding| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/jinquan122/keyword_finetuned_embedding \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-lab1_random_reshphil_en.md b/docs/_posts/ahmedlone127/2024-09-09-lab1_random_reshphil_en.md new file mode 100644 index 00000000000000..f865b50fce994f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-lab1_random_reshphil_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lab1_random_reshphil MarianTransformer from Reshphil +author: John Snow Labs +name: lab1_random_reshphil +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_random_reshphil` is a English model originally trained by Reshphil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_random_reshphil_en_5.5.0_3.0_1725914219686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_random_reshphil_en_5.5.0_3.0_1725914219686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("lab1_random_reshphil","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("lab1_random_reshphil","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_random_reshphil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.2 MB| + +## References + +https://huggingface.co/Reshphil/lab1_random \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-legal_swiss_roberta_base_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-09-legal_swiss_roberta_base_pipeline_de.md new file mode 100644 index 00000000000000..cfea3deec9c165 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-legal_swiss_roberta_base_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German legal_swiss_roberta_base_pipeline pipeline RoBertaEmbeddings from joelniklaus +author: John Snow Labs +name: legal_swiss_roberta_base_pipeline +date: 2024-09-09 +tags: [de, open_source, pipeline, onnx] +task: Embeddings +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_swiss_roberta_base_pipeline` is a German model originally trained by joelniklaus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_swiss_roberta_base_pipeline_de_5.5.0_3.0_1725884662156.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_swiss_roberta_base_pipeline_de_5.5.0_3.0_1725884662156.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("legal_swiss_roberta_base_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("legal_swiss_roberta_base_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_swiss_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|692.1 MB| + +## References + +https://huggingface.co/joelniklaus/legal-swiss-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-local_distilbert_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-local_distilbert_model_pipeline_en.md new file mode 100644 index 00000000000000..5fd63a6bac4e6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-local_distilbert_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English local_distilbert_model_pipeline pipeline DistilBertForTokenClassification from senthil2002 +author: John Snow Labs +name: local_distilbert_model_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`local_distilbert_model_pipeline` is a English model originally trained by senthil2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/local_distilbert_model_pipeline_en_5.5.0_3.0_1725890013514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/local_distilbert_model_pipeline_en_5.5.0_3.0_1725890013514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("local_distilbert_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("local_distilbert_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|local_distilbert_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/senthil2002/local_distilbert_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-maltese_coref_english_arabic_gender_exp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-maltese_coref_english_arabic_gender_exp_pipeline_en.md new file mode 100644 index 00000000000000..214bfd3e8fa03a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-maltese_coref_english_arabic_gender_exp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English maltese_coref_english_arabic_gender_exp_pipeline pipeline MarianTransformer from nlphuji +author: John Snow Labs +name: maltese_coref_english_arabic_gender_exp_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_coref_english_arabic_gender_exp_pipeline` is a English model originally trained by nlphuji. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_coref_english_arabic_gender_exp_pipeline_en_5.5.0_3.0_1725865409099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_coref_english_arabic_gender_exp_pipeline_en_5.5.0_3.0_1725865409099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("maltese_coref_english_arabic_gender_exp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("maltese_coref_english_arabic_gender_exp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_coref_english_arabic_gender_exp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|529.2 MB| + +## References + +https://huggingface.co/nlphuji/mt_coref_en_ar_gender_exp + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-marian_finetuned_kde4_english_tonga_tonga_islands_french_ss1020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-marian_finetuned_kde4_english_tonga_tonga_islands_french_ss1020_pipeline_en.md new file mode 100644 index 00000000000000..289d5ff48426a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-marian_finetuned_kde4_english_tonga_tonga_islands_french_ss1020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_ss1020_pipeline pipeline MarianTransformer from SS1020 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_ss1020_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_ss1020_pipeline` is a English model originally trained by SS1020. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ss1020_pipeline_en_5.5.0_3.0_1725913423222.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ss1020_pipeline_en_5.5.0_3.0_1725913423222.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_ss1020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_ss1020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_ss1020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/SS1020/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-ope_bert_v1_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-ope_bert_v1_2_pipeline_en.md new file mode 100644 index 00000000000000..db2b2da0c2ba77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-ope_bert_v1_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ope_bert_v1_2_pipeline pipeline DistilBertEmbeddings from RyotaroOKabe +author: John Snow Labs +name: ope_bert_v1_2_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ope_bert_v1_2_pipeline` is a English model originally trained by RyotaroOKabe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ope_bert_v1_2_pipeline_en_5.5.0_3.0_1725868415150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ope_bert_v1_2_pipeline_en_5.5.0_3.0_1725868415150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ope_bert_v1_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ope_bert_v1_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ope_bert_v1_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.4 MB| + +## References + +https://huggingface.co/RyotaroOKabe/ope_bert_v1.2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_arabic_finetuned_english_tonga_tonga_islands_arabic_test2_instances_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_arabic_finetuned_english_tonga_tonga_islands_arabic_test2_instances_pipeline_en.md new file mode 100644 index 00000000000000..9bd91e52a5fcd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_arabic_finetuned_english_tonga_tonga_islands_arabic_test2_instances_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_arabic_finetuned_english_tonga_tonga_islands_arabic_test2_instances_pipeline pipeline MarianTransformer from meghazisofiane +author: John Snow Labs +name: opus_maltese_english_arabic_finetuned_english_tonga_tonga_islands_arabic_test2_instances_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_arabic_finetuned_english_tonga_tonga_islands_arabic_test2_instances_pipeline` is a English model originally trained by meghazisofiane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_arabic_finetuned_english_tonga_tonga_islands_arabic_test2_instances_pipeline_en_5.5.0_3.0_1725864837091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_arabic_finetuned_english_tonga_tonga_islands_arabic_test2_instances_pipeline_en_5.5.0_3.0_1725864837091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_arabic_finetuned_english_tonga_tonga_islands_arabic_test2_instances_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_arabic_finetuned_english_tonga_tonga_islands_arabic_test2_instances_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_arabic_finetuned_english_tonga_tonga_islands_arabic_test2_instances_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|528.8 MB| + +## References + +https://huggingface.co/meghazisofiane/opus-mt-en-ar-finetuned-en-to-ar-test2-instances + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_dutch_finetuned_combined_38_train_val_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_dutch_finetuned_combined_38_train_val_en.md new file mode 100644 index 00000000000000..1b33e9cf79a3ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_dutch_finetuned_combined_38_train_val_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_dutch_finetuned_combined_38_train_val MarianTransformer from kalcho100 +author: John Snow Labs +name: opus_maltese_english_dutch_finetuned_combined_38_train_val +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_dutch_finetuned_combined_38_train_val` is a English model originally trained by kalcho100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_dutch_finetuned_combined_38_train_val_en_5.5.0_3.0_1725913408194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_dutch_finetuned_combined_38_train_val_en_5.5.0_3.0_1725913408194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_dutch_finetuned_combined_38_train_val","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_dutch_finetuned_combined_38_train_val","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_dutch_finetuned_combined_38_train_val| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|548.8 MB| + +## References + +https://huggingface.co/kalcho100/opus-mt-en-nl-finetuned_combined_38_train_val \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_dutch_finetuned_combined_38_train_val_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_dutch_finetuned_combined_38_train_val_pipeline_en.md new file mode 100644 index 00000000000000..1e8b1b6cdb612b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_dutch_finetuned_combined_38_train_val_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_dutch_finetuned_combined_38_train_val_pipeline pipeline MarianTransformer from kalcho100 +author: John Snow Labs +name: opus_maltese_english_dutch_finetuned_combined_38_train_val_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_dutch_finetuned_combined_38_train_val_pipeline` is a English model originally trained by kalcho100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_dutch_finetuned_combined_38_train_val_pipeline_en_5.5.0_3.0_1725913436964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_dutch_finetuned_combined_38_train_val_pipeline_en_5.5.0_3.0_1725913436964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_dutch_finetuned_combined_38_train_val_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_dutch_finetuned_combined_38_train_val_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_dutch_finetuned_combined_38_train_val_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|549.4 MB| + +## References + +https://huggingface.co/kalcho100/opus-mt-en-nl-finetuned_combined_38_train_val + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_german_bds_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_german_bds_pipeline_en.md new file mode 100644 index 00000000000000..13cc0b5e0caeb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_german_bds_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_german_bds_pipeline pipeline MarianTransformer from Anhptp +author: John Snow Labs +name: opus_maltese_english_german_bds_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_german_bds_pipeline` is a English model originally trained by Anhptp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_german_bds_pipeline_en_5.5.0_3.0_1725840298146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_german_bds_pipeline_en_5.5.0_3.0_1725840298146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_german_bds_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_german_bds_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_german_bds_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|500.0 MB| + +## References + +https://huggingface.co/Anhptp/opus-mt-en-de-BDS + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_dlyfar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_dlyfar_pipeline_en.md new file mode 100644 index 00000000000000..b02a7638e63245 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_dlyfar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_dlyfar_pipeline pipeline MarianTransformer from dlyfar +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_dlyfar_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_dlyfar_pipeline` is a English model originally trained by dlyfar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_dlyfar_pipeline_en_5.5.0_3.0_1725891362439.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_dlyfar_pipeline_en_5.5.0_3.0_1725891362439.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_dlyfar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_dlyfar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_dlyfar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.1 MB| + +## References + +https://huggingface.co/dlyfar/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_finetuned_english_tonga_tonga_islands_romanian_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_finetuned_english_tonga_tonga_islands_romanian_en.md new file mode 100644 index 00000000000000..5177aa342ae7f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_finetuned_english_tonga_tonga_islands_romanian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_finetuned_english_tonga_tonga_islands_romanian MarianTransformer from MicMer17 +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_finetuned_english_tonga_tonga_islands_romanian +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_finetuned_english_tonga_tonga_islands_romanian` is a English model originally trained by MicMer17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_finetuned_english_tonga_tonga_islands_romanian_en_5.5.0_3.0_1725914183652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_finetuned_english_tonga_tonga_islands_romanian_en_5.5.0_3.0_1725914183652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_finetuned_english_tonga_tonga_islands_romanian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_finetuned_english_tonga_tonga_islands_romanian","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_finetuned_english_tonga_tonga_islands_romanian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/MicMer17/opus-mt-en-ro-finetuned-en-to-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shuvayanti_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shuvayanti_pipeline_en.md new file mode 100644 index 00000000000000..f452d160567ed4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shuvayanti_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shuvayanti_pipeline pipeline MarianTransformer from shuvayanti +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shuvayanti_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shuvayanti_pipeline` is a English model originally trained by shuvayanti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shuvayanti_pipeline_en_5.5.0_3.0_1725864908698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shuvayanti_pipeline_en_5.5.0_3.0_1725864908698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shuvayanti_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shuvayanti_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shuvayanti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.1 MB| + +## References + +https://huggingface.co/shuvayanti/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_southern_sotho_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_southern_sotho_en.md new file mode 100644 index 00000000000000..aedd3d4dcf5b4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_southern_sotho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_southern_sotho MarianTransformer from cw1521 +author: John Snow Labs +name: opus_maltese_english_southern_sotho +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_southern_sotho` is a English model originally trained by cw1521. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_southern_sotho_en_5.5.0_3.0_1725891253151.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_southern_sotho_en_5.5.0_3.0_1725891253151.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_southern_sotho","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_southern_sotho","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_southern_sotho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.3 MB| + +## References + +https://huggingface.co/cw1521/opus-mt-en-st \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_finetuned_english_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_finetuned_english_spanish_en.md new file mode 100644 index 00000000000000..2ac4a3f490524d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_finetuned_english_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_finetuned_english_spanish MarianTransformer from rossanez +author: John Snow Labs +name: opus_maltese_finetuned_english_spanish +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_finetuned_english_spanish` is a English model originally trained by rossanez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_finetuned_english_spanish_en_5.5.0_3.0_1725864703618.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_finetuned_english_spanish_en_5.5.0_3.0_1725864703618.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_finetuned_english_spanish","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_finetuned_english_spanish","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_finetuned_english_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|540.0 MB| + +## References + +https://huggingface.co/rossanez/opus-mt-finetuned-en-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_italian_english_finetuned_20000_italian_tonga_tonga_islands_english_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_italian_english_finetuned_20000_italian_tonga_tonga_islands_english_en.md new file mode 100644 index 00000000000000..7fb5e46395b4f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_italian_english_finetuned_20000_italian_tonga_tonga_islands_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_italian_english_finetuned_20000_italian_tonga_tonga_islands_english MarianTransformer from VFiona +author: John Snow Labs +name: opus_maltese_italian_english_finetuned_20000_italian_tonga_tonga_islands_english +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_italian_english_finetuned_20000_italian_tonga_tonga_islands_english` is a English model originally trained by VFiona. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_italian_english_finetuned_20000_italian_tonga_tonga_islands_english_en_5.5.0_3.0_1725913842897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_italian_english_finetuned_20000_italian_tonga_tonga_islands_english_en_5.5.0_3.0_1725913842897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_italian_english_finetuned_20000_italian_tonga_tonga_islands_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_italian_english_finetuned_20000_italian_tonga_tonga_islands_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_italian_english_finetuned_20000_italian_tonga_tonga_islands_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|625.0 MB| + +## References + +https://huggingface.co/VFiona/opus-mt-it-en-finetuned_20000-it-to-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_turkish_tonga_tonga_islands_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_turkish_tonga_tonga_islands_english_pipeline_en.md new file mode 100644 index 00000000000000..79bfba39af0fad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_turkish_tonga_tonga_islands_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_turkish_tonga_tonga_islands_english_pipeline pipeline MarianTransformer from ckartal +author: John Snow Labs +name: opus_maltese_turkish_tonga_tonga_islands_english_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_turkish_tonga_tonga_islands_english_pipeline` is a English model originally trained by ckartal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_turkish_tonga_tonga_islands_english_pipeline_en_5.5.0_3.0_1725863788513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_turkish_tonga_tonga_islands_english_pipeline_en_5.5.0_3.0_1725863788513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_turkish_tonga_tonga_islands_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_turkish_tonga_tonga_islands_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_turkish_tonga_tonga_islands_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|525.8 MB| + +## References + +https://huggingface.co/ckartal/opus-mt-tr-to-en + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-output_sotseth_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-output_sotseth_pipeline_en.md new file mode 100644 index 00000000000000..a4d4d7b750455c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-output_sotseth_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English output_sotseth_pipeline pipeline RoBertaForSequenceClassification from sotseth +author: John Snow Labs +name: output_sotseth_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`output_sotseth_pipeline` is a English model originally trained by sotseth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/output_sotseth_pipeline_en_5.5.0_3.0_1725903854473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/output_sotseth_pipeline_en_5.5.0_3.0_1725903854473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("output_sotseth_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("output_sotseth_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|output_sotseth_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/sotseth/output + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-q2d_origin_re_5_en.md b/docs/_posts/ahmedlone127/2024-09-09-q2d_origin_re_5_en.md new file mode 100644 index 00000000000000..22b3baeb87af63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-q2d_origin_re_5_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English q2d_origin_re_5 MPNetEmbeddings from ingeol +author: John Snow Labs +name: q2d_origin_re_5 +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`q2d_origin_re_5` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/q2d_origin_re_5_en_5.5.0_3.0_1725897099342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/q2d_origin_re_5_en_5.5.0_3.0_1725897099342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("q2d_origin_re_5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("q2d_origin_re_5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|q2d_origin_re_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/q2d_origin_re_5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-q2d_orire_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-q2d_orire_5_pipeline_en.md new file mode 100644 index 00000000000000..a0e6b3b82e3462 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-q2d_orire_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English q2d_orire_5_pipeline pipeline MPNetEmbeddings from ingeol +author: John Snow Labs +name: q2d_orire_5_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`q2d_orire_5_pipeline` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/q2d_orire_5_pipeline_en_5.5.0_3.0_1725896583705.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/q2d_orire_5_pipeline_en_5.5.0_3.0_1725896583705.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("q2d_orire_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("q2d_orire_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|q2d_orire_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/q2d_orire_5 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-q2d_re_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-q2d_re_5_pipeline_en.md new file mode 100644 index 00000000000000..94192aa7aae418 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-q2d_re_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English q2d_re_5_pipeline pipeline MPNetEmbeddings from ingeol +author: John Snow Labs +name: q2d_re_5_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`q2d_re_5_pipeline` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/q2d_re_5_pipeline_en_5.5.0_3.0_1725874146451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/q2d_re_5_pipeline_en_5.5.0_3.0_1725874146451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("q2d_re_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("q2d_re_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|q2d_re_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/q2d_re_5 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-qqp_microsoft_deberta_v3_large_seed_2_en.md b/docs/_posts/ahmedlone127/2024-09-09-qqp_microsoft_deberta_v3_large_seed_2_en.md new file mode 100644 index 00000000000000..393aad4d225c65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-qqp_microsoft_deberta_v3_large_seed_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English qqp_microsoft_deberta_v3_large_seed_2 DeBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: qqp_microsoft_deberta_v3_large_seed_2 +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qqp_microsoft_deberta_v3_large_seed_2` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qqp_microsoft_deberta_v3_large_seed_2_en_5.5.0_3.0_1725880372833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qqp_microsoft_deberta_v3_large_seed_2_en_5.5.0_3.0_1725880372833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("qqp_microsoft_deberta_v3_large_seed_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("qqp_microsoft_deberta_v3_large_seed_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qqp_microsoft_deberta_v3_large_seed_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/utahnlp/qqp_microsoft_deberta-v3-large_seed-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-quberta_qu.md b/docs/_posts/ahmedlone127/2024-09-09-quberta_qu.md new file mode 100644 index 00000000000000..17855a02a62a67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-quberta_qu.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Quechua quberta RoBertaEmbeddings from Llamacha +author: John Snow Labs +name: quberta +date: 2024-09-09 +tags: [qu, open_source, onnx, embeddings, roberta] +task: Embeddings +language: qu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`quberta` is a Quechua model originally trained by Llamacha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/quberta_qu_5.5.0_3.0_1725882883821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/quberta_qu_5.5.0_3.0_1725882883821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("quberta","qu") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("quberta","qu") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|quberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|qu| +|Size:|311.1 MB| + +## References + +https://huggingface.co/Llamacha/QuBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-roberta_base_catalan_v2_ca.md b/docs/_posts/ahmedlone127/2024-09-09-roberta_base_catalan_v2_ca.md new file mode 100644 index 00000000000000..2d125b479b4258 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-roberta_base_catalan_v2_ca.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Catalan, Valencian roberta_base_catalan_v2 RoBertaEmbeddings from projecte-aina +author: John Snow Labs +name: roberta_base_catalan_v2 +date: 2024-09-09 +tags: [ca, open_source, onnx, embeddings, roberta] +task: Embeddings +language: ca +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_catalan_v2` is a Catalan, Valencian model originally trained by projecte-aina. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_catalan_v2_ca_5.5.0_3.0_1725860994038.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_catalan_v2_ca_5.5.0_3.0_1725860994038.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_catalan_v2","ca") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_catalan_v2","ca") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_catalan_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|ca| +|Size:|295.1 MB| + +## References + +https://huggingface.co/projecte-aina/roberta-base-ca-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-roberta_reman_en.md b/docs/_posts/ahmedlone127/2024-09-09-roberta_reman_en.md new file mode 100644 index 00000000000000..daa1bc52ccc1b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-roberta_reman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_reman RoBertaForSequenceClassification from gustavecortal +author: John Snow Labs +name: roberta_reman +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_reman` is a English model originally trained by gustavecortal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_reman_en_5.5.0_3.0_1725911841889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_reman_en_5.5.0_3.0_1725911841889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_reman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_reman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_reman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/gustavecortal/roberta_reman \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-southern_sotho_all_mpnet_finetuned_english_1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-southern_sotho_all_mpnet_finetuned_english_1000_pipeline_en.md new file mode 100644 index 00000000000000..e85ec579f182f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-southern_sotho_all_mpnet_finetuned_english_1000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English southern_sotho_all_mpnet_finetuned_english_1000_pipeline pipeline MPNetEmbeddings from danfeg +author: John Snow Labs +name: southern_sotho_all_mpnet_finetuned_english_1000_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`southern_sotho_all_mpnet_finetuned_english_1000_pipeline` is a English model originally trained by danfeg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/southern_sotho_all_mpnet_finetuned_english_1000_pipeline_en_5.5.0_3.0_1725897005304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/southern_sotho_all_mpnet_finetuned_english_1000_pipeline_en_5.5.0_3.0_1725897005304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("southern_sotho_all_mpnet_finetuned_english_1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("southern_sotho_all_mpnet_finetuned_english_1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|southern_sotho_all_mpnet_finetuned_english_1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/danfeg/ST-ALL-MPNET_Finetuned-EN-1000 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-tatoeba_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-09-tatoeba_finetuned_en.md new file mode 100644 index 00000000000000..4524074f22c455 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-tatoeba_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tatoeba_finetuned MarianTransformer from muibk +author: John Snow Labs +name: tatoeba_finetuned +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tatoeba_finetuned` is a English model originally trained by muibk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tatoeba_finetuned_en_5.5.0_3.0_1725865032066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tatoeba_finetuned_en_5.5.0_3.0_1725865032066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("tatoeba_finetuned","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("tatoeba_finetuned","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tatoeba_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|537.6 MB| + +## References + +https://huggingface.co/muibk/tatoeba_finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-training_model_en.md b/docs/_posts/ahmedlone127/2024-09-09-training_model_en.md new file mode 100644 index 00000000000000..6b9e27ec0de073 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-training_model_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English training_model DistilBertForSequenceClassification from Brecon +author: John Snow Labs +name: training_model +date: 2024-09-09 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`training_model` is a English model originally trained by Brecon. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/training_model_en_5.5.0_3.0_1725869243182.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/training_model_en_5.5.0_3.0_1725869243182.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("training_model","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("training_model","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|training_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +References + +https://huggingface.co/Brecon/training_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-training_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-training_model_pipeline_en.md new file mode 100644 index 00000000000000..a4ada19e146402 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-training_model_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English training_model_pipeline pipeline DistilBertForQuestionAnswering from lwq1010 +author: John Snow Labs +name: training_model_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`training_model_pipeline` is a English model originally trained by lwq1010. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/training_model_pipeline_en_5.5.0_3.0_1725869255869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/training_model_pipeline_en_5.5.0_3.0_1725869255869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("training_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("training_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|training_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/lwq1010/training_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-translation_eng_ger_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-translation_eng_ger_pipeline_en.md new file mode 100644 index 00000000000000..d5421bd49366dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-translation_eng_ger_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English translation_eng_ger_pipeline pipeline MarianTransformer from perceptron-743 +author: John Snow Labs +name: translation_eng_ger_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translation_eng_ger_pipeline` is a English model originally trained by perceptron-743. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translation_eng_ger_pipeline_en_5.5.0_3.0_1725840648475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translation_eng_ger_pipeline_en_5.5.0_3.0_1725840648475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("translation_eng_ger_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("translation_eng_ger_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translation_eng_ger_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|499.8 MB| + +## References + +https://huggingface.co/perceptron-743/translation-eng-ger + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_balance_vietnam_aug_replace_bert_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_balance_vietnam_aug_replace_bert_en.md new file mode 100644 index 00000000000000..449a4745da17c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_balance_vietnam_aug_replace_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_replace_bert XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_replace_bert +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_replace_bert` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_bert_en_5.5.0_3.0_1725871303946.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_bert_en_5.5.0_3.0_1725871303946.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_replace_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_replace_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_replace_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|794.6 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_replace_BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_marc_begar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_marc_begar_pipeline_en.md new file mode 100644 index 00000000000000..0e0268c788a98e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_marc_begar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_begar_pipeline pipeline XlmRoBertaForSequenceClassification from begar +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_begar_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_begar_pipeline` is a English model originally trained by begar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_begar_pipeline_en_5.5.0_3.0_1725907650139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_begar_pipeline_en_5.5.0_3.0_1725907650139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_marc_begar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_marc_begar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_begar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|835.1 MB| + +## References + +https://huggingface.co/begar/xlm-roberta-base-finetuned-marc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_all_yezune_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_all_yezune_en.md new file mode 100644 index 00000000000000..19eaa7eb7f47ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_all_yezune_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_yezune XlmRoBertaForTokenClassification from yezune +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_yezune +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_yezune` is a English model originally trained by yezune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_yezune_en_5.5.0_3.0_1725922811018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_yezune_en_5.5.0_3.0_1725922811018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_yezune","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_yezune", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_yezune| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/yezune/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_english_the_neural_networker_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_english_the_neural_networker_en.md new file mode 100644 index 00000000000000..dd10ad0f06cf42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_english_the_neural_networker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_the_neural_networker XlmRoBertaForTokenClassification from the-neural-networker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_the_neural_networker +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_the_neural_networker` is a English model originally trained by the-neural-networker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_the_neural_networker_en_5.5.0_3.0_1725895352985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_the_neural_networker_en_5.5.0_3.0_1725895352985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_the_neural_networker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_the_neural_networker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_the_neural_networker| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|836.4 MB| + +## References + +https://huggingface.co/the-neural-networker/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_french_noveled_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_french_noveled_en.md new file mode 100644 index 00000000000000..407239a176c355 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_french_noveled_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_noveled XlmRoBertaForTokenClassification from Noveled +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_noveled +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_noveled` is a English model originally trained by Noveled. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_noveled_en_5.5.0_3.0_1725919151928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_noveled_en_5.5.0_3.0_1725919151928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_noveled","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_noveled", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_noveled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/Noveled/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_german_italian_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_german_italian_en.md new file mode 100644 index 00000000000000..b5b879631c60d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_german_italian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_italian XlmRoBertaForTokenClassification from Ferro +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_italian +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_italian` is a English model originally trained by Ferro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_italian_en_5.5.0_3.0_1725923102592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_italian_en_5.5.0_3.0_1725923102592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_italian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_italian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|856.7 MB| + +## References + +https://huggingface.co/Ferro/xlm-roberta-base-finetuned-panx-de-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_german_nes74_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_german_nes74_en.md new file mode 100644 index 00000000000000..0e1b63f50ba6d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_german_nes74_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_nes74 XlmRoBertaForTokenClassification from nes74 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_nes74 +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_nes74` is a English model originally trained by nes74. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nes74_en_5.5.0_3.0_1725919160199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nes74_en_5.5.0_3.0_1725919160199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_nes74","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_nes74", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_nes74| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/nes74/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_italian_buruzaemon_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_italian_buruzaemon_en.md new file mode 100644 index 00000000000000..7eb6b32e9d1a68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_italian_buruzaemon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_buruzaemon XlmRoBertaForTokenClassification from buruzaemon +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_buruzaemon +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_buruzaemon` is a English model originally trained by buruzaemon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_buruzaemon_en_5.5.0_3.0_1725894869773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_buruzaemon_en_5.5.0_3.0_1725894869773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_buruzaemon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_buruzaemon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_buruzaemon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/buruzaemon/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_italian_robkayinto_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_italian_robkayinto_en.md new file mode 100644 index 00000000000000..64510e41f0b28a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_italian_robkayinto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_robkayinto XlmRoBertaForTokenClassification from robkayinto +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_robkayinto +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_robkayinto` is a English model originally trained by robkayinto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_robkayinto_en_5.5.0_3.0_1725918785058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_robkayinto_en_5.5.0_3.0_1725918785058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_robkayinto","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_robkayinto", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_robkayinto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/robkayinto/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_base_v2_fine_tuned_epochs_1_binhcode25_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_base_v2_fine_tuned_epochs_1_binhcode25_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..c0176a229cb4ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_base_v2_fine_tuned_epochs_1_binhcode25_finetuned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English all_mpnet_base_v2_fine_tuned_epochs_1_binhcode25_finetuned_pipeline pipeline MPNetEmbeddings from binhcode25-finetuned +author: John Snow Labs +name: all_mpnet_base_v2_fine_tuned_epochs_1_binhcode25_finetuned_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_base_v2_fine_tuned_epochs_1_binhcode25_finetuned_pipeline` is a English model originally trained by binhcode25-finetuned. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_fine_tuned_epochs_1_binhcode25_finetuned_pipeline_en_5.5.0_3.0_1725936391676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_fine_tuned_epochs_1_binhcode25_finetuned_pipeline_en_5.5.0_3.0_1725936391676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_mpnet_base_v2_fine_tuned_epochs_1_binhcode25_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_mpnet_base_v2_fine_tuned_epochs_1_binhcode25_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_base_v2_fine_tuned_epochs_1_binhcode25_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/binhcode25-finetuned/all-mpnet-base-v2-fine-tuned-epochs-1 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_base_v2_lr_2e_7_margin_1_epoch_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_base_v2_lr_2e_7_margin_1_epoch_1_pipeline_en.md new file mode 100644 index 00000000000000..1977d89aebf2ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_base_v2_lr_2e_7_margin_1_epoch_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English all_mpnet_base_v2_lr_2e_7_margin_1_epoch_1_pipeline pipeline MPNetEmbeddings from luiz-and-robert-thesis +author: John Snow Labs +name: all_mpnet_base_v2_lr_2e_7_margin_1_epoch_1_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_base_v2_lr_2e_7_margin_1_epoch_1_pipeline` is a English model originally trained by luiz-and-robert-thesis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_lr_2e_7_margin_1_epoch_1_pipeline_en_5.5.0_3.0_1725969880667.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_lr_2e_7_margin_1_epoch_1_pipeline_en_5.5.0_3.0_1725969880667.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_mpnet_base_v2_lr_2e_7_margin_1_epoch_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_mpnet_base_v2_lr_2e_7_margin_1_epoch_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_base_v2_lr_2e_7_margin_1_epoch_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/luiz-and-robert-thesis/all-mpnet-base-v2-lr-2e-7-margin-1-epoch-1 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_base_v2_sts_tomaarsen_en.md b/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_base_v2_sts_tomaarsen_en.md new file mode 100644 index 00000000000000..84cb3c84cc16c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_base_v2_sts_tomaarsen_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English all_mpnet_base_v2_sts_tomaarsen MPNetEmbeddings from tomaarsen +author: John Snow Labs +name: all_mpnet_base_v2_sts_tomaarsen +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_base_v2_sts_tomaarsen` is a English model originally trained by tomaarsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_sts_tomaarsen_en_5.5.0_3.0_1725995228728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_sts_tomaarsen_en_5.5.0_3.0_1725995228728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("all_mpnet_base_v2_sts_tomaarsen","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("all_mpnet_base_v2_sts_tomaarsen","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_base_v2_sts_tomaarsen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/tomaarsen/all-mpnet-base-v2-sts \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_base_v2_warmup_cosine_margin_1_epoch_1_en.md b/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_base_v2_warmup_cosine_margin_1_epoch_1_en.md new file mode 100644 index 00000000000000..693d75c2a02b39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_base_v2_warmup_cosine_margin_1_epoch_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English all_mpnet_base_v2_warmup_cosine_margin_1_epoch_1 MPNetEmbeddings from luiz-and-robert-thesis +author: John Snow Labs +name: all_mpnet_base_v2_warmup_cosine_margin_1_epoch_1 +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_base_v2_warmup_cosine_margin_1_epoch_1` is a English model originally trained by luiz-and-robert-thesis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_warmup_cosine_margin_1_epoch_1_en_5.5.0_3.0_1725994928033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_warmup_cosine_margin_1_epoch_1_en_5.5.0_3.0_1725994928033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("all_mpnet_base_v2_warmup_cosine_margin_1_epoch_1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("all_mpnet_base_v2_warmup_cosine_margin_1_epoch_1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_base_v2_warmup_cosine_margin_1_epoch_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/luiz-and-robert-thesis/all-mpnet-base-v2-warmup-cosine-margin-1-epoch-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-autonlp_predict_roi_1_29797730_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-autonlp_predict_roi_1_29797730_pipeline_en.md new file mode 100644 index 00000000000000..1623d8364c3df4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-autonlp_predict_roi_1_29797730_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autonlp_predict_roi_1_29797730_pipeline pipeline RoBertaForSequenceClassification from ds198799 +author: John Snow Labs +name: autonlp_predict_roi_1_29797730_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autonlp_predict_roi_1_29797730_pipeline` is a English model originally trained by ds198799. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autonlp_predict_roi_1_29797730_pipeline_en_5.5.0_3.0_1725965883641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autonlp_predict_roi_1_29797730_pipeline_en_5.5.0_3.0_1725965883641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autonlp_predict_roi_1_29797730_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autonlp_predict_roi_1_29797730_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autonlp_predict_roi_1_29797730_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.2 MB| + +## References + +https://huggingface.co/ds198799/autonlp-predict_ROI_1-29797730 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-babyberta_aochildes_french_without_masking_finetuned_run1_french_squad_en.md b/docs/_posts/ahmedlone127/2024-09-10-babyberta_aochildes_french_without_masking_finetuned_run1_french_squad_en.md new file mode 100644 index 00000000000000..c1e45337dffd4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-babyberta_aochildes_french_without_masking_finetuned_run1_french_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_aochildes_french_without_masking_finetuned_run1_french_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_french_without_masking_finetuned_run1_french_squad +date: 2024-09-10 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_french_without_masking_finetuned_run1_french_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_french_without_masking_finetuned_run1_french_squad_en_5.5.0_3.0_1725987617310.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_french_without_masking_finetuned_run1_french_squad_en_5.5.0_3.0_1725987617310.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_french_without_masking_finetuned_run1_french_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_french_without_masking_finetuned_run1_french_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_french_without_masking_finetuned_run1_french_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes-french-without-Masking-finetuned-run1-Fr-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-bengali_abusive_muril_bn.md b/docs/_posts/ahmedlone127/2024-09-10-bengali_abusive_muril_bn.md new file mode 100644 index 00000000000000..adb1854b766f1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-bengali_abusive_muril_bn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bengali bengali_abusive_muril BertForSequenceClassification from Hate-speech-CNERG +author: John Snow Labs +name: bengali_abusive_muril +date: 2024-09-10 +tags: [bn, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bengali_abusive_muril` is a Bengali model originally trained by Hate-speech-CNERG. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bengali_abusive_muril_bn_5.5.0_3.0_1725977004468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bengali_abusive_muril_bn_5.5.0_3.0_1725977004468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bengali_abusive_muril","bn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bengali_abusive_muril", "bn") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bengali_abusive_muril| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|bn| +|Size:|892.6 MB| + +## References + +https://huggingface.co/Hate-speech-CNERG/bengali-abusive-MuRIL \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-bert_base_uncased_ftd_on_glue_stsb_iter_5_en.md b/docs/_posts/ahmedlone127/2024-09-10-bert_base_uncased_ftd_on_glue_stsb_iter_5_en.md new file mode 100644 index 00000000000000..598686aa5049ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-bert_base_uncased_ftd_on_glue_stsb_iter_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_ftd_on_glue_stsb_iter_5 BertForSequenceClassification from Ibrahim-Alam +author: John Snow Labs +name: bert_base_uncased_ftd_on_glue_stsb_iter_5 +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ftd_on_glue_stsb_iter_5` is a English model originally trained by Ibrahim-Alam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ftd_on_glue_stsb_iter_5_en_5.5.0_3.0_1725957468615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ftd_on_glue_stsb_iter_5_en_5.5.0_3.0_1725957468615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_ftd_on_glue_stsb_iter_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_ftd_on_glue_stsb_iter_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ftd_on_glue_stsb_iter_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Ibrahim-Alam/bert-base-uncased_FTd_on_glue-stsb_iter-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-biobert_maelstrom_cleaned_en.md b/docs/_posts/ahmedlone127/2024-09-10-biobert_maelstrom_cleaned_en.md new file mode 100644 index 00000000000000..ff3109ad8cec5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-biobert_maelstrom_cleaned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English biobert_maelstrom_cleaned BertForSequenceClassification from JuSas +author: John Snow Labs +name: biobert_maelstrom_cleaned +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_maelstrom_cleaned` is a English model originally trained by JuSas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_maelstrom_cleaned_en_5.5.0_3.0_1726000007624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_maelstrom_cleaned_en_5.5.0_3.0_1726000007624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("biobert_maelstrom_cleaned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("biobert_maelstrom_cleaned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_maelstrom_cleaned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.6 MB| + +## References + +https://huggingface.co/JuSas/biobert-Maelstrom-cleaned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-burmese_awesome_qa_model_josetfm_en.md b/docs/_posts/ahmedlone127/2024-09-10-burmese_awesome_qa_model_josetfm_en.md new file mode 100644 index 00000000000000..7f5f0a87f872f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-burmese_awesome_qa_model_josetfm_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_josetfm DistilBertForQuestionAnswering from joseTfm +author: John Snow Labs +name: burmese_awesome_qa_model_josetfm +date: 2024-09-10 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_josetfm` is a English model originally trained by joseTfm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_josetfm_en_5.5.0_3.0_1725980207674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_josetfm_en_5.5.0_3.0_1725980207674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_josetfm","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_josetfm", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_josetfm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/joseTfm/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-burmese_mps_roberta_based_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-burmese_mps_roberta_based_model_pipeline_en.md new file mode 100644 index 00000000000000..cd42578a0a4a6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-burmese_mps_roberta_based_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_mps_roberta_based_model_pipeline pipeline RoBertaEmbeddings from MS-Huang0714 +author: John Snow Labs +name: burmese_mps_roberta_based_model_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_mps_roberta_based_model_pipeline` is a English model originally trained by MS-Huang0714. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_mps_roberta_based_model_pipeline_en_5.5.0_3.0_1725930744360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_mps_roberta_based_model_pipeline_en_5.5.0_3.0_1725930744360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_mps_roberta_based_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_mps_roberta_based_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_mps_roberta_based_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.5 MB| + +## References + +https://huggingface.co/MS-Huang0714/my-MPS-roberta-based_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-burmese_nepal_bhasa_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-burmese_nepal_bhasa_model_pipeline_en.md new file mode 100644 index 00000000000000..ddb0272da6c6f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-burmese_nepal_bhasa_model_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_nepal_bhasa_model_pipeline pipeline MPNetEmbeddings from tubyneto +author: John Snow Labs +name: burmese_nepal_bhasa_model_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_nepal_bhasa_model_pipeline` is a English model originally trained by tubyneto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_pipeline_en_5.5.0_3.0_1725936055886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_pipeline_en_5.5.0_3.0_1725936055886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_nepal_bhasa_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_nepal_bhasa_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_nepal_bhasa_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/tubyneto/my_new_model + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-cross_all_bs160_allneg_finetuned_webnlg2020_relevance_en.md b/docs/_posts/ahmedlone127/2024-09-10-cross_all_bs160_allneg_finetuned_webnlg2020_relevance_en.md new file mode 100644 index 00000000000000..746f13f073fe22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-cross_all_bs160_allneg_finetuned_webnlg2020_relevance_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English cross_all_bs160_allneg_finetuned_webnlg2020_relevance MPNetEmbeddings from teven +author: John Snow Labs +name: cross_all_bs160_allneg_finetuned_webnlg2020_relevance +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cross_all_bs160_allneg_finetuned_webnlg2020_relevance` is a English model originally trained by teven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cross_all_bs160_allneg_finetuned_webnlg2020_relevance_en_5.5.0_3.0_1725995299616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cross_all_bs160_allneg_finetuned_webnlg2020_relevance_en_5.5.0_3.0_1725995299616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("cross_all_bs160_allneg_finetuned_webnlg2020_relevance","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("cross_all_bs160_allneg_finetuned_webnlg2020_relevance","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cross_all_bs160_allneg_finetuned_webnlg2020_relevance| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/teven/cross_all_bs160_allneg_finetuned_WebNLG2020_relevance \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-cross_all_mpnet_base_v2_finetuned_webnlg2017_en.md b/docs/_posts/ahmedlone127/2024-09-10-cross_all_mpnet_base_v2_finetuned_webnlg2017_en.md new file mode 100644 index 00000000000000..2b34fde95b73f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-cross_all_mpnet_base_v2_finetuned_webnlg2017_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English cross_all_mpnet_base_v2_finetuned_webnlg2017 MPNetEmbeddings from teven +author: John Snow Labs +name: cross_all_mpnet_base_v2_finetuned_webnlg2017 +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cross_all_mpnet_base_v2_finetuned_webnlg2017` is a English model originally trained by teven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cross_all_mpnet_base_v2_finetuned_webnlg2017_en_5.5.0_3.0_1725995412788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cross_all_mpnet_base_v2_finetuned_webnlg2017_en_5.5.0_3.0_1725995412788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("cross_all_mpnet_base_v2_finetuned_webnlg2017","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("cross_all_mpnet_base_v2_finetuned_webnlg2017","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cross_all_mpnet_base_v2_finetuned_webnlg2017| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/teven/cross_all-mpnet-base-v2_finetuned_WebNLG2017 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_fillmask_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_fillmask_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline_en.md new file mode 100644 index 00000000000000..ea886b1e556cad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_fillmask_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_fillmask_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline pipeline DistilBertForSequenceClassification from BanUrsus +author: John Snow Labs +name: distilbert_base_uncased_fillmask_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_fillmask_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline` is a English model originally trained by BanUrsus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fillmask_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline_en_5.5.0_3.0_1726009522997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fillmask_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline_en_5.5.0_3.0_1726009522997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_fillmask_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_fillmask_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_fillmask_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BanUrsus/distilbert-base-uncased-fillmask-finetuned-imdb-classifier_nlp-course-chapter7-section2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_emotion_cocabuton_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_emotion_cocabuton_pipeline_en.md new file mode 100644 index 00000000000000..519ef6174652fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_emotion_cocabuton_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cocabuton_pipeline pipeline DistilBertForSequenceClassification from CocaButon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cocabuton_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cocabuton_pipeline` is a English model originally trained by CocaButon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cocabuton_pipeline_en_5.5.0_3.0_1726009289651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cocabuton_pipeline_en_5.5.0_3.0_1726009289651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cocabuton_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cocabuton_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cocabuton_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/CocaButon/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_emotion_mohamedahmedae_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_emotion_mohamedahmedae_en.md new file mode 100644 index 00000000000000..abd14354875f52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_emotion_mohamedahmedae_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_mohamedahmedae DistilBertForSequenceClassification from MohamedAhmedAE +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_mohamedahmedae +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_mohamedahmedae` is a English model originally trained by MohamedAhmedAE. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mohamedahmedae_en_5.5.0_3.0_1725983514986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mohamedahmedae_en_5.5.0_3.0_1725983514986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mohamedahmedae","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mohamedahmedae", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_mohamedahmedae| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MohamedAhmedAE/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_adanish91_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_adanish91_pipeline_en.md new file mode 100644 index 00000000000000..a3dc89f8f40484 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_adanish91_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_adanish91_pipeline pipeline DistilBertEmbeddings from adanish91 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_adanish91_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_adanish91_pipeline` is a English model originally trained by adanish91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_adanish91_pipeline_en_5.5.0_3.0_1725994038251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_adanish91_pipeline_en_5.5.0_3.0_1725994038251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_adanish91_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_adanish91_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_adanish91_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/adanish91/distilbert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_squad_mondol007_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_squad_mondol007_en.md new file mode 100644 index 00000000000000..61d722ecc26313 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_squad_mondol007_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_mondol007 DistilBertForQuestionAnswering from mondol007 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_mondol007 +date: 2024-09-10 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_mondol007` is a English model originally trained by mondol007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_mondol007_en_5.5.0_3.0_1725960465632.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_mondol007_en_5.5.0_3.0_1725960465632.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_mondol007","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_mondol007", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_mondol007| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/mondol007/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_sequence_classifier_distilbert_base_uncased_finetuned_sst_2_english_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_sequence_classifier_distilbert_base_uncased_finetuned_sst_2_english_en.md new file mode 100644 index 00000000000000..2da447426de2e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_sequence_classifier_distilbert_base_uncased_finetuned_sst_2_english_en.md @@ -0,0 +1,108 @@ +--- +layout: model +title: English DistilBertForSequenceClassification Base Uncased model +author: John Snow Labs +name: distilbert_sequence_classifier_distilbert_base_uncased_finetuned_sst_2_english +date: 2024-09-10 +tags: [distilbert, sequence_classification, open_source, en, openvino] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `distilbert-base-uncased-finetuned-sst-2-english` is a English model originally trained by HuggingFace. + +## Predicted Entities + +`POSITIVE`, `NEGATIVE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sequence_classifier_distilbert_base_uncased_finetuned_sst_2_english_en_5.5.0_3.0_1725955537990.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sequence_classifier_distilbert_base_uncased_finetuned_sst_2_english_en_5.5.0_3.0_1725955537990.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +sequenceClassifier_loaded = DistilBertForSequenceClassification.pretrained("distilbert_sequence_classifier_distilbert_base_uncased_finetuned_sst_2_english","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("class") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer,sequenceClassifier_loaded]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier_loaded = DistilBertForSequenceClassification.pretrained("distilbert_sequence_classifier_distilbert_base_uncased_finetuned_sst_2_english","en") + .setInputCols(Array("document", "token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer,sequenceClassifier_loaded)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.classify.distil_bert.glue.uncased_base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sequence_classifier_distilbert_base_uncased_finetuned_sst_2_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[token, document]| +|Output Labels:|[label]| +|Language:|en| +|Size:|249.3 MB| +|Case sensitive:|true| + +## References + +References + +- https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english +- https://paperswithcode.com/sota?task=Text+Classification&dataset=glue +- https://colab.research.google.com/gist/ageron/fb2f64fb145b4bc7c49efc97e5f114d3/biasmap.ipynb +- https://twitter.com/aureliengeron \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distillbert_base_spanish_uncased_finetuned_spanish_corpus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-distillbert_base_spanish_uncased_finetuned_spanish_corpus_pipeline_en.md new file mode 100644 index 00000000000000..48885f7391ec52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distillbert_base_spanish_uncased_finetuned_spanish_corpus_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distillbert_base_spanish_uncased_finetuned_spanish_corpus_pipeline pipeline DistilBertEmbeddings from tatakof +author: John Snow Labs +name: distillbert_base_spanish_uncased_finetuned_spanish_corpus_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_base_spanish_uncased_finetuned_spanish_corpus_pipeline` is a English model originally trained by tatakof. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_base_spanish_uncased_finetuned_spanish_corpus_pipeline_en_5.5.0_3.0_1725993931133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_base_spanish_uncased_finetuned_spanish_corpus_pipeline_en_5.5.0_3.0_1725993931133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distillbert_base_spanish_uncased_finetuned_spanish_corpus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distillbert_base_spanish_uncased_finetuned_spanish_corpus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_base_spanish_uncased_finetuned_spanish_corpus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|250.2 MB| + +## References + +https://huggingface.co/tatakof/distillbert-base-spanish-uncased-finetuned-spanish-corpus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilroberta_base_ft_sex_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilroberta_base_ft_sex_en.md new file mode 100644 index 00000000000000..1ab5f785e20690 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilroberta_base_ft_sex_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_sex RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_sex +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_sex` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_sex_en_5.5.0_3.0_1725930791828.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_sex_en_5.5.0_3.0_1725930791828.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_sex","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_sex","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_sex| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-sex \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-emoji_emoji_temporal_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-10-emoji_emoji_temporal_bertweet_large_en.md new file mode 100644 index 00000000000000..3df5f45021320c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-emoji_emoji_temporal_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_temporal_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_temporal_bertweet_large +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_temporal_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_temporal_bertweet_large_en_5.5.0_3.0_1725964904546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_temporal_bertweet_large_en_5.5.0_3.0_1725964904546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_temporal_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_temporal_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_temporal_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_temporal-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-finetuned_maltese_english_swedish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-finetuned_maltese_english_swedish_pipeline_en.md new file mode 100644 index 00000000000000..17699f0b6c4183 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-finetuned_maltese_english_swedish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_maltese_english_swedish_pipeline pipeline MarianTransformer from adamnavarro +author: John Snow Labs +name: finetuned_maltese_english_swedish_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_maltese_english_swedish_pipeline` is a English model originally trained by adamnavarro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_maltese_english_swedish_pipeline_en_5.5.0_3.0_1726002322847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_maltese_english_swedish_pipeline_en_5.5.0_3.0_1726002322847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_maltese_english_swedish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_maltese_english_swedish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_maltese_english_swedish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|491.8 MB| + +## References + +https://huggingface.co/adamnavarro/finetuned-mt-en-sv + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-finetuned_roberta_base_uncased_emotion_8_en.md b/docs/_posts/ahmedlone127/2024-09-10-finetuned_roberta_base_uncased_emotion_8_en.md new file mode 100644 index 00000000000000..84b41e5ac46f29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-finetuned_roberta_base_uncased_emotion_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_roberta_base_uncased_emotion_8 RoBertaForSequenceClassification from StrangeCroissant +author: John Snow Labs +name: finetuned_roberta_base_uncased_emotion_8 +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_roberta_base_uncased_emotion_8` is a English model originally trained by StrangeCroissant. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_roberta_base_uncased_emotion_8_en_5.5.0_3.0_1725971006409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_roberta_base_uncased_emotion_8_en_5.5.0_3.0_1725971006409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_roberta_base_uncased_emotion_8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_roberta_base_uncased_emotion_8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_roberta_base_uncased_emotion_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|457.0 MB| + +## References + +https://huggingface.co/StrangeCroissant/finetuned-roberta-base-uncased-emotion-8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-finetuned_sts_catalan_mpnet_base_en.md b/docs/_posts/ahmedlone127/2024-09-10-finetuned_sts_catalan_mpnet_base_en.md new file mode 100644 index 00000000000000..1e31fa185d1e9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-finetuned_sts_catalan_mpnet_base_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English finetuned_sts_catalan_mpnet_base MPNetEmbeddings from pauhidalgoo +author: John Snow Labs +name: finetuned_sts_catalan_mpnet_base +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sts_catalan_mpnet_base` is a English model originally trained by pauhidalgoo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sts_catalan_mpnet_base_en_5.5.0_3.0_1725994934136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sts_catalan_mpnet_base_en_5.5.0_3.0_1725994934136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("finetuned_sts_catalan_mpnet_base","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("finetuned_sts_catalan_mpnet_base","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sts_catalan_mpnet_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|383.9 MB| + +## References + +https://huggingface.co/pauhidalgoo/finetuned-sts-ca-mpnet-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-gqa_roberta_german_legal_squad_17_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-10-gqa_roberta_german_legal_squad_17_pipeline_de.md new file mode 100644 index 00000000000000..4294095245f75f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-gqa_roberta_german_legal_squad_17_pipeline_de.md @@ -0,0 +1,69 @@ +--- +layout: model +title: German gqa_roberta_german_legal_squad_17_pipeline pipeline RoBertaForQuestionAnswering from farid1088 +author: John Snow Labs +name: gqa_roberta_german_legal_squad_17_pipeline +date: 2024-09-10 +tags: [de, open_source, pipeline, onnx] +task: Question Answering +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gqa_roberta_german_legal_squad_17_pipeline` is a German model originally trained by farid1088. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gqa_roberta_german_legal_squad_17_pipeline_de_5.5.0_3.0_1725987103475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gqa_roberta_german_legal_squad_17_pipeline_de_5.5.0_3.0_1725987103475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gqa_roberta_german_legal_squad_17_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gqa_roberta_german_legal_squad_17_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gqa_roberta_german_legal_squad_17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|465.8 MB| + +## References + +https://huggingface.co/farid1088/GQA_RoBERTa_German_legal_SQuAD_17 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-imdb_fewshot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-imdb_fewshot_pipeline_en.md new file mode 100644 index 00000000000000..831b5b81df9dda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-imdb_fewshot_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English imdb_fewshot_pipeline pipeline MPNetEmbeddings from pig4431 +author: John Snow Labs +name: imdb_fewshot_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_fewshot_pipeline` is a English model originally trained by pig4431. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_fewshot_pipeline_en_5.5.0_3.0_1725936780315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_fewshot_pipeline_en_5.5.0_3.0_1725936780315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdb_fewshot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdb_fewshot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_fewshot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/pig4431/IMDB_fewshot + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-incremental_semi_supervised_training_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-incremental_semi_supervised_training_base_pipeline_en.md new file mode 100644 index 00000000000000..74f841aed4424b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-incremental_semi_supervised_training_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English incremental_semi_supervised_training_base_pipeline pipeline RoBertaForSequenceClassification from bitsanlp +author: John Snow Labs +name: incremental_semi_supervised_training_base_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`incremental_semi_supervised_training_base_pipeline` is a English model originally trained by bitsanlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/incremental_semi_supervised_training_base_pipeline_en_5.5.0_3.0_1725970871405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/incremental_semi_supervised_training_base_pipeline_en_5.5.0_3.0_1725970871405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("incremental_semi_supervised_training_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("incremental_semi_supervised_training_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|incremental_semi_supervised_training_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/bitsanlp/incremental-semi-supervised-training-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-medpsynit_it.md b/docs/_posts/ahmedlone127/2024-09-10-medpsynit_it.md new file mode 100644 index 00000000000000..8382bc99e72cb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-medpsynit_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian medpsynit BertForTokenClassification from IVN-RIN +author: John Snow Labs +name: medpsynit +date: 2024-09-10 +tags: [it, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medpsynit` is a Italian model originally trained by IVN-RIN. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medpsynit_it_5.5.0_3.0_1725934066431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medpsynit_it_5.5.0_3.0_1725934066431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("medpsynit","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("medpsynit", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medpsynit| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|it| +|Size:|409.2 MB| + +## References + +https://huggingface.co/IVN-RIN/MedPsyNIT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-multilingual_e5_base_finetuned_cola_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-10-multilingual_e5_base_finetuned_cola_pipeline_xx.md new file mode 100644 index 00000000000000..166b3c0014e0e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-multilingual_e5_base_finetuned_cola_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual multilingual_e5_base_finetuned_cola_pipeline pipeline XlmRoBertaForSequenceClassification from anhnv125 +author: John Snow Labs +name: multilingual_e5_base_finetuned_cola_pipeline +date: 2024-09-10 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_e5_base_finetuned_cola_pipeline` is a Multilingual model originally trained by anhnv125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_e5_base_finetuned_cola_pipeline_xx_5.5.0_3.0_1726004210362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_e5_base_finetuned_cola_pipeline_xx_5.5.0_3.0_1726004210362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multilingual_e5_base_finetuned_cola_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multilingual_e5_base_finetuned_cola_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_e5_base_finetuned_cola_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|831.1 MB| + +## References + +https://huggingface.co/anhnv125/multilingual-e5-base-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-nlp_team_binarytoxicityclassifierforevaluationpurpose_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-nlp_team_binarytoxicityclassifierforevaluationpurpose_pipeline_en.md new file mode 100644 index 00000000000000..a89ac1764846d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-nlp_team_binarytoxicityclassifierforevaluationpurpose_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_team_binarytoxicityclassifierforevaluationpurpose_pipeline pipeline RoBertaForSequenceClassification from naman632 +author: John Snow Labs +name: nlp_team_binarytoxicityclassifierforevaluationpurpose_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_team_binarytoxicityclassifierforevaluationpurpose_pipeline` is a English model originally trained by naman632. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_team_binarytoxicityclassifierforevaluationpurpose_pipeline_en_5.5.0_3.0_1725965261978.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_team_binarytoxicityclassifierforevaluationpurpose_pipeline_en_5.5.0_3.0_1725965261978.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_team_binarytoxicityclassifierforevaluationpurpose_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_team_binarytoxicityclassifierforevaluationpurpose_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_team_binarytoxicityclassifierforevaluationpurpose_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/naman632/NLP_team_binaryToxicityClassifierForEvaluationPurpose + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-norwegian_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-10-norwegian_roberta_base_en.md new file mode 100644 index 00000000000000..447f093d5a956d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-norwegian_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English norwegian_roberta_base RoBertaEmbeddings from patrickvonplaten +author: John Snow Labs +name: norwegian_roberta_base +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_roberta_base` is a English model originally trained by patrickvonplaten. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_roberta_base_en_5.5.0_3.0_1726005365961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_roberta_base_en_5.5.0_3.0_1726005365961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("norwegian_roberta_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("norwegian_roberta_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|462.9 MB| + +## References + +https://huggingface.co/patrickvonplaten/norwegian-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-ope_bert_v1_1_en.md b/docs/_posts/ahmedlone127/2024-09-10-ope_bert_v1_1_en.md new file mode 100644 index 00000000000000..2dac97081fc6ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-ope_bert_v1_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ope_bert_v1_1 DistilBertEmbeddings from RyotaroOKabe +author: John Snow Labs +name: ope_bert_v1_1 +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ope_bert_v1_1` is a English model originally trained by RyotaroOKabe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ope_bert_v1_1_en_5.5.0_3.0_1726008024852.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ope_bert_v1_1_en_5.5.0_3.0_1726008024852.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("ope_bert_v1_1","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("ope_bert_v1_1","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ope_bert_v1_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.6 MB| + +## References + +https://huggingface.co/RyotaroOKabe/ope_bert_v1.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_riaraju_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_riaraju_pipeline_en.md new file mode 100644 index 00000000000000..b04c6075f8b416 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_riaraju_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_riaraju_pipeline pipeline MarianTransformer from riaraju +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_riaraju_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_riaraju_pipeline` is a English model originally trained by riaraju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_riaraju_pipeline_en_5.5.0_3.0_1726002120235.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_riaraju_pipeline_en_5.5.0_3.0_1726002120235.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_riaraju_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_riaraju_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_riaraju_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.2 MB| + +## References + +https://huggingface.co/riaraju/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-passage_ranker_mango_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-10-passage_ranker_mango_pipeline_xx.md new file mode 100644 index 00000000000000..280facebc8a9da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-passage_ranker_mango_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual passage_ranker_mango_pipeline pipeline BertForSequenceClassification from sinequa +author: John Snow Labs +name: passage_ranker_mango_pipeline +date: 2024-09-10 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`passage_ranker_mango_pipeline` is a Multilingual model originally trained by sinequa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/passage_ranker_mango_pipeline_xx_5.5.0_3.0_1725977482681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/passage_ranker_mango_pipeline_xx_5.5.0_3.0_1725977482681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("passage_ranker_mango_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("passage_ranker_mango_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|passage_ranker_mango_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/sinequa/passage-ranker.mango + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-pls_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-pls_pipeline_en.md new file mode 100644 index 00000000000000..38600bfcfe9c64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-pls_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English pls_pipeline pipeline MPNetEmbeddings from Watwat100 +author: John Snow Labs +name: pls_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pls_pipeline` is a English model originally trained by Watwat100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pls_pipeline_en_5.5.0_3.0_1725964195564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pls_pipeline_en_5.5.0_3.0_1725964195564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pls_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pls_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pls_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/Watwat100/pls + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-psais_multi_qa_mpnet_base_dot_v1_20shot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-psais_multi_qa_mpnet_base_dot_v1_20shot_pipeline_en.md new file mode 100644 index 00000000000000..16b8ab350effb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-psais_multi_qa_mpnet_base_dot_v1_20shot_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English psais_multi_qa_mpnet_base_dot_v1_20shot_pipeline pipeline MPNetEmbeddings from hroth01 +author: John Snow Labs +name: psais_multi_qa_mpnet_base_dot_v1_20shot_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`psais_multi_qa_mpnet_base_dot_v1_20shot_pipeline` is a English model originally trained by hroth01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/psais_multi_qa_mpnet_base_dot_v1_20shot_pipeline_en_5.5.0_3.0_1725978239484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/psais_multi_qa_mpnet_base_dot_v1_20shot_pipeline_en_5.5.0_3.0_1725978239484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("psais_multi_qa_mpnet_base_dot_v1_20shot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("psais_multi_qa_mpnet_base_dot_v1_20shot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|psais_multi_qa_mpnet_base_dot_v1_20shot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/hroth01/psais-multi-qa-mpnet-base-dot-v1-20shot + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-q2d_half_5_en.md b/docs/_posts/ahmedlone127/2024-09-10-q2d_half_5_en.md new file mode 100644 index 00000000000000..d61fc9e49dc2f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-q2d_half_5_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English q2d_half_5 MPNetEmbeddings from ingeol +author: John Snow Labs +name: q2d_half_5 +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`q2d_half_5` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/q2d_half_5_en_5.5.0_3.0_1725996895510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/q2d_half_5_en_5.5.0_3.0_1725996895510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("q2d_half_5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("q2d_half_5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|q2d_half_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/q2d_half_5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_en.md b/docs/_posts/ahmedlone127/2024-09-10-r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_en.md new file mode 100644 index 00000000000000..ca9223f59c57d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30 MPNetEmbeddings from shrinivasbjoshi +author: John Snow Labs +name: r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30 +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30` is a English model originally trained by shrinivasbjoshi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_en_5.5.0_3.0_1725963410303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_en_5.5.0_3.0_1725963410303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/shrinivasbjoshi/r2-w266-setfit-mbti-multiclass-hypsearch-mpnet-nov30 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-roberta_base_bne_squad2_spanish_es.md b/docs/_posts/ahmedlone127/2024-09-10-roberta_base_bne_squad2_spanish_es.md new file mode 100644 index 00000000000000..06d9bd27295de1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-roberta_base_bne_squad2_spanish_es.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Castilian, Spanish roberta_base_bne_squad2_spanish RoBertaForQuestionAnswering from somosnlp-hackathon-2022 +author: John Snow Labs +name: roberta_base_bne_squad2_spanish +date: 2024-09-10 +tags: [es, open_source, onnx, question_answering, roberta] +task: Question Answering +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_squad2_spanish` is a Castilian, Spanish model originally trained by somosnlp-hackathon-2022. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_squad2_spanish_es_5.5.0_3.0_1725987247658.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_squad2_spanish_es_5.5.0_3.0_1725987247658.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_bne_squad2_spanish","es") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_bne_squad2_spanish", "es") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_squad2_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|es| +|Size:|455.8 MB| + +## References + +https://huggingface.co/somosnlp-hackathon-2022/roberta-base-bne-squad2-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-roberta_base_coqa_en.md b/docs/_posts/ahmedlone127/2024-09-10-roberta_base_coqa_en.md new file mode 100644 index 00000000000000..4b39f60385099c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-roberta_base_coqa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_coqa RoBertaForQuestionAnswering from WillHeld +author: John Snow Labs +name: roberta_base_coqa +date: 2024-09-10 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_coqa` is a English model originally trained by WillHeld. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_coqa_en_5.5.0_3.0_1725987087492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_coqa_en_5.5.0_3.0_1725987087492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_coqa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_coqa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_coqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|462.6 MB| + +## References + +https://huggingface.co/WillHeld/roberta-base-coqa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-roberta_qa_cart_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-roberta_qa_cart_pipeline_en.md new file mode 100644 index 00000000000000..ff35442a52d967 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-roberta_qa_cart_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_qa_cart_pipeline pipeline RoBertaForQuestionAnswering from pierrerappolt +author: John Snow Labs +name: roberta_qa_cart_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_qa_cart_pipeline` is a English model originally trained by pierrerappolt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_cart_pipeline_en_5.5.0_3.0_1725987512912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_cart_pipeline_en_5.5.0_3.0_1725987512912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_qa_cart_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_qa_cart_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_cart_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.8 MB| + +## References + +https://huggingface.co/pierrerappolt/cart + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-sequence_classification_msivanes_en.md b/docs/_posts/ahmedlone127/2024-09-10-sequence_classification_msivanes_en.md new file mode 100644 index 00000000000000..bbc6d1db2a756f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-sequence_classification_msivanes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sequence_classification_msivanes DistilBertForSequenceClassification from msivanes +author: John Snow Labs +name: sequence_classification_msivanes +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sequence_classification_msivanes` is a English model originally trained by msivanes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sequence_classification_msivanes_en_5.5.0_3.0_1725983749642.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sequence_classification_msivanes_en_5.5.0_3.0_1725983749642.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sequence_classification_msivanes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sequence_classification_msivanes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sequence_classification_msivanes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/msivanes/sequence_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-setfit_model_feb11_miscellaneous_misinformation_en.md b/docs/_posts/ahmedlone127/2024-09-10-setfit_model_feb11_miscellaneous_misinformation_en.md new file mode 100644 index 00000000000000..79df1d0330ab29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-setfit_model_feb11_miscellaneous_misinformation_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English setfit_model_feb11_miscellaneous_misinformation MPNetEmbeddings from mitra-mir +author: John Snow Labs +name: setfit_model_feb11_miscellaneous_misinformation +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`setfit_model_feb11_miscellaneous_misinformation` is a English model originally trained by mitra-mir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/setfit_model_feb11_miscellaneous_misinformation_en_5.5.0_3.0_1725995235797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/setfit_model_feb11_miscellaneous_misinformation_en_5.5.0_3.0_1725995235797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("setfit_model_feb11_miscellaneous_misinformation","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("setfit_model_feb11_miscellaneous_misinformation","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|setfit_model_feb11_miscellaneous_misinformation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/mitra-mir/setfit-model-Feb11-Miscellaneous-Misinformation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-setfit_model_feb11_misinformation_on_govt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-setfit_model_feb11_misinformation_on_govt_pipeline_en.md new file mode 100644 index 00000000000000..738b4862a8dba3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-setfit_model_feb11_misinformation_on_govt_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English setfit_model_feb11_misinformation_on_govt_pipeline pipeline MPNetEmbeddings from mitra-mir +author: John Snow Labs +name: setfit_model_feb11_misinformation_on_govt_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`setfit_model_feb11_misinformation_on_govt_pipeline` is a English model originally trained by mitra-mir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/setfit_model_feb11_misinformation_on_govt_pipeline_en_5.5.0_3.0_1725969844462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/setfit_model_feb11_misinformation_on_govt_pipeline_en_5.5.0_3.0_1725969844462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("setfit_model_feb11_misinformation_on_govt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("setfit_model_feb11_misinformation_on_govt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|setfit_model_feb11_misinformation_on_govt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/mitra-mir/setfit-model-Feb11-Misinformation-on-Govt + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-setfit_model_feb11_misinformation_on_mandates_public_health_en.md b/docs/_posts/ahmedlone127/2024-09-10-setfit_model_feb11_misinformation_on_mandates_public_health_en.md new file mode 100644 index 00000000000000..cdbb2ee21df003 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-setfit_model_feb11_misinformation_on_mandates_public_health_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English setfit_model_feb11_misinformation_on_mandates_public_health MPNetEmbeddings from mitra-mir +author: John Snow Labs +name: setfit_model_feb11_misinformation_on_mandates_public_health +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`setfit_model_feb11_misinformation_on_mandates_public_health` is a English model originally trained by mitra-mir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/setfit_model_feb11_misinformation_on_mandates_public_health_en_5.5.0_3.0_1725978811803.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/setfit_model_feb11_misinformation_on_mandates_public_health_en_5.5.0_3.0_1725978811803.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("setfit_model_feb11_misinformation_on_mandates_public_health","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("setfit_model_feb11_misinformation_on_mandates_public_health","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|setfit_model_feb11_misinformation_on_mandates_public_health| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/mitra-mir/setfit-model-Feb11-Misinformation-on-Mandates-Public-Health \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-setfitrrcrececnr_en.md b/docs/_posts/ahmedlone127/2024-09-10-setfitrrcrececnr_en.md new file mode 100644 index 00000000000000..92e4f13198ef96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-setfitrrcrececnr_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English setfitrrcrececnr MPNetEmbeddings from mann2107 +author: John Snow Labs +name: setfitrrcrececnr +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`setfitrrcrececnr` is a English model originally trained by mann2107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/setfitrrcrececnr_en_5.5.0_3.0_1725995081397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/setfitrrcrececnr_en_5.5.0_3.0_1725995081397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("setfitrrcrececnr","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("setfitrrcrececnr","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|setfitrrcrececnr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/mann2107/setfitrrcrececnr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-squad_qa_model_harrisonlowe_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-squad_qa_model_harrisonlowe_pipeline_en.md new file mode 100644 index 00000000000000..c4594f00e9fdd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-squad_qa_model_harrisonlowe_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English squad_qa_model_harrisonlowe_pipeline pipeline DistilBertForQuestionAnswering from HarrisonLowe +author: John Snow Labs +name: squad_qa_model_harrisonlowe_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squad_qa_model_harrisonlowe_pipeline` is a English model originally trained by HarrisonLowe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squad_qa_model_harrisonlowe_pipeline_en_5.5.0_3.0_1725932349687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squad_qa_model_harrisonlowe_pipeline_en_5.5.0_3.0_1725932349687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("squad_qa_model_harrisonlowe_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("squad_qa_model_harrisonlowe_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squad_qa_model_harrisonlowe_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/HarrisonLowe/squad_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-sst2_fewshot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-sst2_fewshot_pipeline_en.md new file mode 100644 index 00000000000000..c23c8e6f494c20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-sst2_fewshot_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English sst2_fewshot_pipeline pipeline MPNetEmbeddings from pig4431 +author: John Snow Labs +name: sst2_fewshot_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst2_fewshot_pipeline` is a English model originally trained by pig4431. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst2_fewshot_pipeline_en_5.5.0_3.0_1725978525437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst2_fewshot_pipeline_en_5.5.0_3.0_1725978525437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sst2_fewshot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sst2_fewshot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst2_fewshot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/pig4431/SST2_fewshot + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-task_intent_detector_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-task_intent_detector_pipeline_en.md new file mode 100644 index 00000000000000..b402239b019df0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-task_intent_detector_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English task_intent_detector_pipeline pipeline RoBertaForSequenceClassification from NOVA-vision-language +author: John Snow Labs +name: task_intent_detector_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`task_intent_detector_pipeline` is a English model originally trained by NOVA-vision-language. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/task_intent_detector_pipeline_en_5.5.0_3.0_1725972073687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/task_intent_detector_pipeline_en_5.5.0_3.0_1725972073687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("task_intent_detector_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("task_intent_detector_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|task_intent_detector_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|430.4 MB| + +## References + +https://huggingface.co/NOVA-vision-language/task-intent-detector + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-test_model_2_en.md b/docs/_posts/ahmedlone127/2024-09-10-test_model_2_en.md new file mode 100644 index 00000000000000..a10ecbe8ad9214 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-test_model_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_model_2 DistilBertEmbeddings from TamBeo +author: John Snow Labs +name: test_model_2 +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_2` is a English model originally trained by TamBeo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_2_en_5.5.0_3.0_1725935301921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_2_en_5.5.0_3.0_1725935301921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("test_model_2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("test_model_2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.1 MB| + +## References + +https://huggingface.co/TamBeo/test_model_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-test_model_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-test_model_2_pipeline_en.md new file mode 100644 index 00000000000000..e491d69b9a094c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-test_model_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_model_2_pipeline pipeline DistilBertEmbeddings from TamBeo +author: John Snow Labs +name: test_model_2_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_2_pipeline` is a English model originally trained by TamBeo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_2_pipeline_en_5.5.0_3.0_1725935313453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_2_pipeline_en_5.5.0_3.0_1725935313453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_model_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_model_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/TamBeo/test_model_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-testing_final_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-testing_final_pipeline_en.md new file mode 100644 index 00000000000000..53dc23273c4cab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-testing_final_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English testing_final_pipeline pipeline MPNetEmbeddings from maneprajakta +author: John Snow Labs +name: testing_final_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testing_final_pipeline` is a English model originally trained by maneprajakta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testing_final_pipeline_en_5.5.0_3.0_1725936659335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testing_final_pipeline_en_5.5.0_3.0_1725936659335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("testing_final_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("testing_final_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testing_final_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/maneprajakta/testing_final + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-text_classification_model_10042024_en.md b/docs/_posts/ahmedlone127/2024-09-10-text_classification_model_10042024_en.md new file mode 100644 index 00000000000000..bdf58b60108839 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-text_classification_model_10042024_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_classification_model_10042024 DistilBertForSequenceClassification from vishnun0027 +author: John Snow Labs +name: text_classification_model_10042024 +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_model_10042024` is a English model originally trained by vishnun0027. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_model_10042024_en_5.5.0_3.0_1725983742381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_model_10042024_en_5.5.0_3.0_1725983742381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_model_10042024","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_model_10042024", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_model_10042024| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vishnun0027/Text_classification_model_10042024 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-tonga_tonga_islands_classifier_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-tonga_tonga_islands_classifier_v2_pipeline_en.md new file mode 100644 index 00000000000000..263b9de5e7f2fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-tonga_tonga_islands_classifier_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tonga_tonga_islands_classifier_v2_pipeline pipeline MPNetEmbeddings from futuredatascience +author: John Snow Labs +name: tonga_tonga_islands_classifier_v2_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tonga_tonga_islands_classifier_v2_pipeline` is a English model originally trained by futuredatascience. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tonga_tonga_islands_classifier_v2_pipeline_en_5.5.0_3.0_1725969599068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tonga_tonga_islands_classifier_v2_pipeline_en_5.5.0_3.0_1725969599068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tonga_tonga_islands_classifier_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tonga_tonga_islands_classifier_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tonga_tonga_islands_classifier_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/futuredatascience/to-classifier-v2 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-topic_topic_temporal_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-10-topic_topic_temporal_bernice_en.md new file mode 100644 index 00000000000000..e9a64548e353e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-topic_topic_temporal_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_temporal_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_temporal_bernice +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_temporal_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_temporal_bernice_en_5.5.0_3.0_1726004330920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_temporal_bernice_en_5.5.0_3.0_1726004330920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_temporal_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_temporal_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_temporal_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|805.3 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_temporal-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-training_v2_ru.md b/docs/_posts/ahmedlone127/2024-09-10-training_v2_ru.md new file mode 100644 index 00000000000000..7e6d68126ece22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-training_v2_ru.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Russian training_v2 WhisperForCTC from SofiaK +author: John Snow Labs +name: training_v2 +date: 2024-09-10 +tags: [ru, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`training_v2` is a Russian model originally trained by SofiaK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/training_v2_ru_5.5.0_3.0_1725949039617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/training_v2_ru_5.5.0_3.0_1725949039617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("training_v2","ru") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("training_v2", "ru") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|training_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ru| +|Size:|641.7 MB| + +## References + +https://huggingface.co/SofiaK/training-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-whisper_dutch_small_nl.md b/docs/_posts/ahmedlone127/2024-09-10-whisper_dutch_small_nl.md new file mode 100644 index 00000000000000..72fce9ffb13130 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-whisper_dutch_small_nl.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dutch, Flemish whisper_dutch_small WhisperForCTC from hannatoenbreker +author: John Snow Labs +name: whisper_dutch_small +date: 2024-09-10 +tags: [nl, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_dutch_small` is a Dutch, Flemish model originally trained by hannatoenbreker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_dutch_small_nl_5.5.0_3.0_1725953218954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_dutch_small_nl_5.5.0_3.0_1725953218954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_dutch_small","nl") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_dutch_small", "nl") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_dutch_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|nl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hannatoenbreker/whisper-dutch-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-whisper_small_chinese_hanson92828_en.md b/docs/_posts/ahmedlone127/2024-09-10-whisper_small_chinese_hanson92828_en.md new file mode 100644 index 00000000000000..8343df3065e815 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-whisper_small_chinese_hanson92828_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_chinese_hanson92828 WhisperForCTC from hanson92828 +author: John Snow Labs +name: whisper_small_chinese_hanson92828 +date: 2024-09-10 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chinese_hanson92828` is a English model originally trained by hanson92828. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_hanson92828_en_5.5.0_3.0_1725940175677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_hanson92828_en_5.5.0_3.0_1725940175677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_chinese_hanson92828","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_chinese_hanson92828", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chinese_hanson92828| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hanson92828/whisper-small-chinese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-whisper_small_tuned_en.md b/docs/_posts/ahmedlone127/2024-09-10-whisper_small_tuned_en.md new file mode 100644 index 00000000000000..1973cfd94c4847 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-whisper_small_tuned_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_tuned WhisperForCTC from olivvan +author: John Snow Labs +name: whisper_small_tuned +date: 2024-09-10 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_tuned` is a English model originally trained by olivvan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_tuned_en_5.5.0_3.0_1725954186378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_tuned_en_5.5.0_3.0_1725954186378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_tuned","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_tuned", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_tuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/olivvan/whisper-small-tuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_all_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_all_en.md new file mode 100644 index 00000000000000..10915179765cf8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_all_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_all XlmRoBertaForSequenceClassification from xeventminer +author: John Snow Labs +name: xlm_roberta_all +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_all` is a English model originally trained by xeventminer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_all_en_5.5.0_3.0_1725981758219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_all_en_5.5.0_3.0_1725981758219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_all","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_all", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_all| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|809.1 MB| + +## References + +https://huggingface.co/xeventminer/xlm-roberta-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_all_youngbreadho_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_all_youngbreadho_en.md new file mode 100644 index 00000000000000..8645a618aa58a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_all_youngbreadho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_youngbreadho XlmRoBertaForTokenClassification from youngbreadho +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_youngbreadho +date: 2024-09-10 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_youngbreadho` is a English model originally trained by youngbreadho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_youngbreadho_en_5.5.0_3.0_1725972840891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_youngbreadho_en_5.5.0_3.0_1725972840891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_youngbreadho","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_youngbreadho", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_youngbreadho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.5 MB| + +## References + +https://huggingface.co/youngbreadho/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_german_french_ericklerouge123_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_german_french_ericklerouge123_pipeline_en.md new file mode 100644 index 00000000000000..75905adeab95de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_german_french_ericklerouge123_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_ericklerouge123_pipeline pipeline XlmRoBertaForTokenClassification from ericklerouge123 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_ericklerouge123_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_ericklerouge123_pipeline` is a English model originally trained by ericklerouge123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ericklerouge123_pipeline_en_5.5.0_3.0_1726010592629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ericklerouge123_pipeline_en_5.5.0_3.0_1726010592629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_ericklerouge123_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_ericklerouge123_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_ericklerouge123_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ericklerouge123/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_removed_0530_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_removed_0530_en.md new file mode 100644 index 00000000000000..257b0c7bd59664 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_removed_0530_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_removed_0530 XlmRoBertaForSequenceClassification from YeRyeongLee +author: John Snow Labs +name: xlm_roberta_base_finetuned_removed_0530 +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_removed_0530` is a English model originally trained by YeRyeongLee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_removed_0530_en_5.5.0_3.0_1725967484473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_removed_0530_en_5.5.0_3.0_1725967484473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_removed_0530","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_removed_0530", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_removed_0530| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|864.0 MB| + +## References + +https://huggingface.co/YeRyeongLee/xlm-roberta-base-finetuned-removed-0530 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlmr_finetuned_igbo_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlmr_finetuned_igbo_en.md new file mode 100644 index 00000000000000..554f224ba725b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlmr_finetuned_igbo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_finetuned_igbo XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: xlmr_finetuned_igbo +date: 2024-09-10 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_finetuned_igbo` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_finetuned_igbo_en_5.5.0_3.0_1725984960806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_finetuned_igbo_en_5.5.0_3.0_1725984960806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_finetuned_igbo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_finetuned_igbo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_finetuned_igbo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|866.2 MB| + +## References + +https://huggingface.co/grace-pro/xlmr-finetuned-igbo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-admedia_brasil_en.md b/docs/_posts/ahmedlone127/2024-09-11-admedia_brasil_en.md new file mode 100644 index 00000000000000..40de88f0ec2aea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-admedia_brasil_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English admedia_brasil BertForSequenceClassification from Fcevalerio +author: John Snow Labs +name: admedia_brasil +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`admedia_brasil` is a English model originally trained by Fcevalerio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/admedia_brasil_en_5.5.0_3.0_1726015545724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/admedia_brasil_en_5.5.0_3.0_1726015545724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("admedia_brasil","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("admedia_brasil", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|admedia_brasil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|627.7 MB| + +## References + +https://huggingface.co/Fcevalerio/AdMedia_Brasil \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-all_mpnet_base_v2_2022_11_07_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-all_mpnet_base_v2_2022_11_07_pipeline_en.md new file mode 100644 index 00000000000000..a7c5d4bcadbeb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-all_mpnet_base_v2_2022_11_07_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English all_mpnet_base_v2_2022_11_07_pipeline pipeline MPNetEmbeddings from ronanki +author: John Snow Labs +name: all_mpnet_base_v2_2022_11_07_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_base_v2_2022_11_07_pipeline` is a English model originally trained by ronanki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_2022_11_07_pipeline_en_5.5.0_3.0_1726089341870.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_2022_11_07_pipeline_en_5.5.0_3.0_1726089341870.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_mpnet_base_v2_2022_11_07_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_mpnet_base_v2_2022_11_07_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_base_v2_2022_11_07_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/ronanki/all-mpnet-base-v2-2022-11-07 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-all_mpnet_base_v2_anteater_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-all_mpnet_base_v2_anteater_pipeline_en.md new file mode 100644 index 00000000000000..c87db8ebe5d44c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-all_mpnet_base_v2_anteater_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English all_mpnet_base_v2_anteater_pipeline pipeline MPNetEmbeddings from IconicAI +author: John Snow Labs +name: all_mpnet_base_v2_anteater_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_base_v2_anteater_pipeline` is a English model originally trained by IconicAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_anteater_pipeline_en_5.5.0_3.0_1726054962293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_anteater_pipeline_en_5.5.0_3.0_1726054962293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_mpnet_base_v2_anteater_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_mpnet_base_v2_anteater_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_base_v2_anteater_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/IconicAI/all-mpnet-base-v2-anteater + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-all_roberta_large_v1_banking_15_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-11-all_roberta_large_v1_banking_15_16_5_en.md new file mode 100644 index 00000000000000..b484026156d34c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-all_roberta_large_v1_banking_15_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_banking_15_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_banking_15_16_5 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_banking_15_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_15_16_5_en_5.5.0_3.0_1726096729196.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_15_16_5_en_5.5.0_3.0_1726096729196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_15_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_15_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_banking_15_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-banking-15-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-all_roberta_large_v1_credit_cards_4_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-11-all_roberta_large_v1_credit_cards_4_16_5_en.md new file mode 100644 index 00000000000000..5b944b18a7d486 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-all_roberta_large_v1_credit_cards_4_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_credit_cards_4_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_credit_cards_4_16_5 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_credit_cards_4_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_credit_cards_4_16_5_en_5.5.0_3.0_1726072031950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_credit_cards_4_16_5_en_5.5.0_3.0_1726072031950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_credit_cards_4_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_credit_cards_4_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_credit_cards_4_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-credit_cards-4-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-amazon_en.md b/docs/_posts/ahmedlone127/2024-09-11-amazon_en.md new file mode 100644 index 00000000000000..458312fb454a1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-amazon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amazon DistilBertForSequenceClassification from bl03 +author: John Snow Labs +name: amazon +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon` is a English model originally trained by bl03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_en_5.5.0_3.0_1726017794091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_en_5.5.0_3.0_1726017794091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("amazon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("amazon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bl03/amazon \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-amazon_helpfulness_classification_roberta_best_f1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-amazon_helpfulness_classification_roberta_best_f1_pipeline_en.md new file mode 100644 index 00000000000000..d1a7e567068d43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-amazon_helpfulness_classification_roberta_best_f1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amazon_helpfulness_classification_roberta_best_f1_pipeline pipeline RoBertaForSequenceClassification from ltuzova +author: John Snow Labs +name: amazon_helpfulness_classification_roberta_best_f1_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_helpfulness_classification_roberta_best_f1_pipeline` is a English model originally trained by ltuzova. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_helpfulness_classification_roberta_best_f1_pipeline_en_5.5.0_3.0_1726096863674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_helpfulness_classification_roberta_best_f1_pipeline_en_5.5.0_3.0_1726096863674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amazon_helpfulness_classification_roberta_best_f1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amazon_helpfulness_classification_roberta_best_f1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_helpfulness_classification_roberta_best_f1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.8 MB| + +## References + +https://huggingface.co/ltuzova/amazon_helpfulness_classification_roberta_best_f1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..e9fe187cb13190 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_finetuned_squad_pipeline pipeline RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_finetuned_squad_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_finetuned_squad_pipeline` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_finetuned_squad_pipeline_en_5.5.0_3.0_1726036388617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_finetuned_squad_pipeline_en_5.5.0_3.0_1726036388617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|31.9 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes_1.25M_aochildes-french1.25M-with-Masking-finetuned-SQuAD + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_1_25m_aochildes_french1_25m_without_masking_seed6_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_1_25m_aochildes_french1_25m_without_masking_seed6_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..6d0c29ec0de714 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_1_25m_aochildes_french1_25m_without_masking_seed6_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English babyberta_aochildes_1_25m_aochildes_french1_25m_without_masking_seed6_finetuned_squad_pipeline pipeline RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_1_25m_aochildes_french1_25m_without_masking_seed6_finetuned_squad_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_1_25m_aochildes_french1_25m_without_masking_seed6_finetuned_squad_pipeline` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_1_25m_aochildes_french1_25m_without_masking_seed6_finetuned_squad_pipeline_en_5.5.0_3.0_1726062159443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_1_25m_aochildes_french1_25m_without_masking_seed6_finetuned_squad_pipeline_en_5.5.0_3.0_1726062159443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babyberta_aochildes_1_25m_aochildes_french1_25m_without_masking_seed6_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babyberta_aochildes_1_25m_aochildes_french1_25m_without_masking_seed6_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_1_25m_aochildes_french1_25m_without_masking_seed6_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|31.9 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes_1.25M_aochildes-french1.25M-without-Masking-seed6-finetuned-SQuAD + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed3_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed3_finetuned_squad_en.md new file mode 100644 index 00000000000000..3b6da4276988e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed3_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed3_finetuned_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed3_finetuned_squad +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed3_finetuned_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed3_finetuned_squad_en_5.5.0_3.0_1726039957438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed3_finetuned_squad_en_5.5.0_3.0_1726039957438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed3_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed3_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed3_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes_2.5M_wikipedia1_2.5M-with-Masking-seed3-finetuned-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_finetuned_french_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_finetuned_french_squad_pipeline_en.md new file mode 100644 index 00000000000000..81fb1591b19f8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_finetuned_french_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_finetuned_french_squad_pipeline pipeline RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_finetuned_french_squad_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_finetuned_french_squad_pipeline` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_finetuned_french_squad_pipeline_en_5.5.0_3.0_1726058095899.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_finetuned_french_squad_pipeline_en_5.5.0_3.0_1726058095899.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_finetuned_french_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_finetuned_french_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_finetuned_french_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes_2.5M_wikipedia1_2.5M-without-Masking-finetuned-french-SQuAD + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-babyberta_with_masking_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-11-babyberta_with_masking_finetuned_squad_en.md new file mode 100644 index 00000000000000..0394e82d6919ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-babyberta_with_masking_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_with_masking_finetuned_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_with_masking_finetuned_squad +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_with_masking_finetuned_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_with_masking_finetuned_squad_en_5.5.0_3.0_1726062177335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_with_masking_finetuned_squad_en_5.5.0_3.0_1726062177335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_with_masking_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_with_masking_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_with_masking_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa_with-Masking-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-bert_classifier_bert_base_english_spanish_codemix_cased_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-11-bert_classifier_bert_base_english_spanish_codemix_cased_pipeline_xx.md new file mode 100644 index 00000000000000..8db1ca6b3df644 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-bert_classifier_bert_base_english_spanish_codemix_cased_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_classifier_bert_base_english_spanish_codemix_cased_pipeline pipeline BertForSequenceClassification from rohanrajpal +author: John Snow Labs +name: bert_classifier_bert_base_english_spanish_codemix_cased_pipeline +date: 2024-09-11 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_bert_base_english_spanish_codemix_cased_pipeline` is a Multilingual model originally trained by rohanrajpal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_bert_base_english_spanish_codemix_cased_pipeline_xx_5.5.0_3.0_1726091986700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_bert_base_english_spanish_codemix_cased_pipeline_xx_5.5.0_3.0_1726091986700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_classifier_bert_base_english_spanish_codemix_cased_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_classifier_bert_base_english_spanish_codemix_cased_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_bert_base_english_spanish_codemix_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|667.3 MB| + +## References + +https://huggingface.co/rohanrajpal/bert-base-en-es-codemix-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-bert_large_uncased_ner_conll2003_andi611_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-bert_large_uncased_ner_conll2003_andi611_pipeline_en.md new file mode 100644 index 00000000000000..7977852141b8b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-bert_large_uncased_ner_conll2003_andi611_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_uncased_ner_conll2003_andi611_pipeline pipeline BertForTokenClassification from andi611 +author: John Snow Labs +name: bert_large_uncased_ner_conll2003_andi611_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_ner_conll2003_andi611_pipeline` is a English model originally trained by andi611. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_ner_conll2003_andi611_pipeline_en_5.5.0_3.0_1726026581925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_ner_conll2003_andi611_pipeline_en_5.5.0_3.0_1726026581925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_ner_conll2003_andi611_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_ner_conll2003_andi611_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_ner_conll2003_andi611_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/andi611/bert-large-uncased-ner-conll2003 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-bert_minangkabau_velocity_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-bert_minangkabau_velocity_pipeline_en.md new file mode 100644 index 00000000000000..c8d13692586f60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-bert_minangkabau_velocity_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_minangkabau_velocity_pipeline pipeline BertForSequenceClassification from hsiungc +author: John Snow Labs +name: bert_minangkabau_velocity_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_minangkabau_velocity_pipeline` is a English model originally trained by hsiungc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_minangkabau_velocity_pipeline_en_5.5.0_3.0_1726016077421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_minangkabau_velocity_pipeline_en_5.5.0_3.0_1726016077421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_minangkabau_velocity_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_minangkabau_velocity_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_minangkabau_velocity_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/hsiungc/bert-min-velocity + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-bert_num_measures_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-bert_num_measures_pipeline_en.md new file mode 100644 index 00000000000000..ab767b244985b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-bert_num_measures_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_num_measures_pipeline pipeline BertForSequenceClassification from hsiungc +author: John Snow Labs +name: bert_num_measures_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_num_measures_pipeline` is a English model originally trained by hsiungc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_num_measures_pipeline_en_5.5.0_3.0_1726015796527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_num_measures_pipeline_en_5.5.0_3.0_1726015796527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_num_measures_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_num_measures_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_num_measures_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/hsiungc/bert-num-measures + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-bertweet_large_norwegian_augment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-bertweet_large_norwegian_augment_pipeline_en.md new file mode 100644 index 00000000000000..60e4b4c4f0d9df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-bertweet_large_norwegian_augment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertweet_large_norwegian_augment_pipeline pipeline RoBertaForSequenceClassification from dvruette +author: John Snow Labs +name: bertweet_large_norwegian_augment_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertweet_large_norwegian_augment_pipeline` is a English model originally trained by dvruette. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertweet_large_norwegian_augment_pipeline_en_5.5.0_3.0_1726071695710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertweet_large_norwegian_augment_pipeline_en_5.5.0_3.0_1726071695710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertweet_large_norwegian_augment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertweet_large_norwegian_augment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertweet_large_norwegian_augment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dvruette/bertweet-large_no-augment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-best_model_yelp_polarity_32_100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-best_model_yelp_polarity_32_100_pipeline_en.md new file mode 100644 index 00000000000000..a13b752ddd010d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-best_model_yelp_polarity_32_100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English best_model_yelp_polarity_32_100_pipeline pipeline AlbertForSequenceClassification from simonycl +author: John Snow Labs +name: best_model_yelp_polarity_32_100_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`best_model_yelp_polarity_32_100_pipeline` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/best_model_yelp_polarity_32_100_pipeline_en_5.5.0_3.0_1726013597060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/best_model_yelp_polarity_32_100_pipeline_en_5.5.0_3.0_1726013597060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("best_model_yelp_polarity_32_100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("best_model_yelp_polarity_32_100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|best_model_yelp_polarity_32_100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/simonycl/best_model-yelp_polarity-32-100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-bsc_bio_ehr_spanish_vih_juicio_anam_urgen_en.md b/docs/_posts/ahmedlone127/2024-09-11-bsc_bio_ehr_spanish_vih_juicio_anam_urgen_en.md new file mode 100644 index 00000000000000..a846df2e9f18fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-bsc_bio_ehr_spanish_vih_juicio_anam_urgen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_vih_juicio_anam_urgen RoBertaForSequenceClassification from Wariano +author: John Snow Labs +name: bsc_bio_ehr_spanish_vih_juicio_anam_urgen +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_vih_juicio_anam_urgen` is a English model originally trained by Wariano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_vih_juicio_anam_urgen_en_5.5.0_3.0_1726089816657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_vih_juicio_anam_urgen_en_5.5.0_3.0_1726089816657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("bsc_bio_ehr_spanish_vih_juicio_anam_urgen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("bsc_bio_ehr_spanish_vih_juicio_anam_urgen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_vih_juicio_anam_urgen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/Wariano/bsc-bio-ehr-es-vih-juicio_anam_urgen \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_hannestt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_hannestt_pipeline_en.md new file mode 100644 index 00000000000000..c12cc8c4603041 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_hannestt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_hannestt_pipeline pipeline DistilBertForSequenceClassification from hannestt +author: John Snow Labs +name: burmese_awesome_model_hannestt_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_hannestt_pipeline` is a English model originally trained by hannestt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_hannestt_pipeline_en_5.5.0_3.0_1726017987974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_hannestt_pipeline_en_5.5.0_3.0_1726017987974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_hannestt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_hannestt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_hannestt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/hannestt/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_prezily_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_prezily_pipeline_en.md new file mode 100644 index 00000000000000..46431caf827836 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_prezily_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_prezily_pipeline pipeline DistilBertForSequenceClassification from Prezily +author: John Snow Labs +name: burmese_awesome_model_prezily_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_prezily_pipeline` is a English model originally trained by Prezily. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_prezily_pipeline_en_5.5.0_3.0_1726052507565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_prezily_pipeline_en_5.5.0_3.0_1726052507565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_prezily_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_prezily_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_prezily_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Prezily/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_st2220_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_st2220_pipeline_en.md new file mode 100644 index 00000000000000..000f464b8d034b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_st2220_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_st2220_pipeline pipeline DistilBertForSequenceClassification from st2220 +author: John Snow Labs +name: burmese_awesome_model_st2220_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_st2220_pipeline` is a English model originally trained by st2220. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_st2220_pipeline_en_5.5.0_3.0_1726017971306.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_st2220_pipeline_en_5.5.0_3.0_1726017971306.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_st2220_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_st2220_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_st2220_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/st2220/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_hhjingbo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_hhjingbo_pipeline_en.md new file mode 100644 index 00000000000000..b7da56b82b24bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_hhjingbo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_hhjingbo_pipeline pipeline DistilBertForQuestionAnswering from HHJingbo +author: John Snow Labs +name: burmese_awesome_qa_model_hhjingbo_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_hhjingbo_pipeline` is a English model originally trained by HHJingbo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_hhjingbo_pipeline_en_5.5.0_3.0_1726016744821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_hhjingbo_pipeline_en_5.5.0_3.0_1726016744821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_hhjingbo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_hhjingbo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_hhjingbo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/HHJingbo/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_ih138_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_ih138_en.md new file mode 100644 index 00000000000000..c706d5e6c61212 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_ih138_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_ih138 DistilBertForQuestionAnswering from ih138 +author: John Snow Labs +name: burmese_awesome_qa_model_ih138 +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_ih138` is a English model originally trained by ih138. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ih138_en_5.5.0_3.0_1726087899423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ih138_en_5.5.0_3.0_1726087899423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_ih138","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_ih138", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_ih138| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/ih138/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_lucija26_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_lucija26_en.md new file mode 100644 index 00000000000000..42ae7aab98660e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_lucija26_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_lucija26 DistilBertForQuestionAnswering from lucija26 +author: John Snow Labs +name: burmese_awesome_qa_model_lucija26 +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_lucija26` is a English model originally trained by lucija26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_lucija26_en_5.5.0_3.0_1726016792013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_lucija26_en_5.5.0_3.0_1726016792013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_lucija26","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_lucija26", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_lucija26| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/lucija26/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_prithviraj_maurya_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_prithviraj_maurya_en.md new file mode 100644 index 00000000000000..fe11252a028f73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_prithviraj_maurya_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_prithviraj_maurya DistilBertForQuestionAnswering from prithviraj-maurya +author: John Snow Labs +name: burmese_awesome_qa_model_prithviraj_maurya +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_prithviraj_maurya` is a English model originally trained by prithviraj-maurya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_prithviraj_maurya_en_5.5.0_3.0_1726087699983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_prithviraj_maurya_en_5.5.0_3.0_1726087699983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_prithviraj_maurya","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_prithviraj_maurya", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_prithviraj_maurya| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/prithviraj-maurya/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_multiclass_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_multiclass_model_pipeline_en.md new file mode 100644 index 00000000000000..185daf1840ae08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_multiclass_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_multiclass_model_pipeline pipeline DistilBertForSequenceClassification from Arsalan8 +author: John Snow Labs +name: burmese_multiclass_model_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_multiclass_model_pipeline` is a English model originally trained by Arsalan8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_multiclass_model_pipeline_en_5.5.0_3.0_1726052332723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_multiclass_model_pipeline_en_5.5.0_3.0_1726052332723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_multiclass_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_multiclass_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_multiclass_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Arsalan8/my_multiclass_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-cebuanolanguagemodel_en.md b/docs/_posts/ahmedlone127/2024-09-11-cebuanolanguagemodel_en.md new file mode 100644 index 00000000000000..2b01d7fc391fca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-cebuanolanguagemodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cebuanolanguagemodel RoBertaEmbeddings from jfernandez +author: John Snow Labs +name: cebuanolanguagemodel +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cebuanolanguagemodel` is a English model originally trained by jfernandez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cebuanolanguagemodel_en_5.5.0_3.0_1726032420687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cebuanolanguagemodel_en_5.5.0_3.0_1726032420687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("cebuanolanguagemodel","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("cebuanolanguagemodel","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cebuanolanguagemodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.1 MB| + +## References + +https://huggingface.co/jfernandez/cebuanoLanguageModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-coha1950s_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-coha1950s_pipeline_en.md new file mode 100644 index 00000000000000..02695dda2e4a7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-coha1950s_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English coha1950s_pipeline pipeline RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1950s_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1950s_pipeline` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1950s_pipeline_en_5.5.0_3.0_1726032424189.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1950s_pipeline_en_5.5.0_3.0_1726032424189.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("coha1950s_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("coha1950s_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1950s_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.2 MB| + +## References + +https://huggingface.co/simonmun/COHA1950s + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-cuad_distil_parties_08_25_en.md b/docs/_posts/ahmedlone127/2024-09-11-cuad_distil_parties_08_25_en.md new file mode 100644 index 00000000000000..d6f68a45a782c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-cuad_distil_parties_08_25_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English cuad_distil_parties_08_25 DistilBertForQuestionAnswering from saraks +author: John Snow Labs +name: cuad_distil_parties_08_25 +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cuad_distil_parties_08_25` is a English model originally trained by saraks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cuad_distil_parties_08_25_en_5.5.0_3.0_1726088130203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cuad_distil_parties_08_25_en_5.5.0_3.0_1726088130203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("cuad_distil_parties_08_25","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("cuad_distil_parties_08_25", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cuad_distil_parties_08_25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/saraks/cuad-distil-parties-08-25 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-deberta_v3_large_sentiment_mixture_en.md b/docs/_posts/ahmedlone127/2024-09-11-deberta_v3_large_sentiment_mixture_en.md new file mode 100644 index 00000000000000..2589d03dc24f45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-deberta_v3_large_sentiment_mixture_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_large_sentiment_mixture DeBertaForSequenceClassification from brutusxu +author: John Snow Labs +name: deberta_v3_large_sentiment_mixture +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_large_sentiment_mixture` is a English model originally trained by brutusxu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_large_sentiment_mixture_en_5.5.0_3.0_1726030910066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_large_sentiment_mixture_en_5.5.0_3.0_1726030910066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_large_sentiment_mixture","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_large_sentiment_mixture", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_large_sentiment_mixture| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/brutusxu/deberta-v3-large-sentiment-mixture \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-deberta_v3_large_sentiment_mixture_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-deberta_v3_large_sentiment_mixture_pipeline_en.md new file mode 100644 index 00000000000000..78efa05ca120a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-deberta_v3_large_sentiment_mixture_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_large_sentiment_mixture_pipeline pipeline DeBertaForSequenceClassification from brutusxu +author: John Snow Labs +name: deberta_v3_large_sentiment_mixture_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_large_sentiment_mixture_pipeline` is a English model originally trained by brutusxu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_large_sentiment_mixture_pipeline_en_5.5.0_3.0_1726031008416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_large_sentiment_mixture_pipeline_en_5.5.0_3.0_1726031008416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_large_sentiment_mixture_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_large_sentiment_mixture_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_large_sentiment_mixture_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/brutusxu/deberta-v3-large-sentiment-mixture + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-deproberta_2ndstage_en.md b/docs/_posts/ahmedlone127/2024-09-11-deproberta_2ndstage_en.md new file mode 100644 index 00000000000000..4cd969c923b9e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-deproberta_2ndstage_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deproberta_2ndstage RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: deproberta_2ndstage +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deproberta_2ndstage` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deproberta_2ndstage_en_5.5.0_3.0_1726071894359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deproberta_2ndstage_en_5.5.0_3.0_1726071894359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("deproberta_2ndstage","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("deproberta_2ndstage", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deproberta_2ndstage| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/DepRoBERTa-2ndStage \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_german_cased_cimt_argument_type_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_german_cased_cimt_argument_type_pipeline_de.md new file mode 100644 index 00000000000000..4d083122ef3500 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_german_cased_cimt_argument_type_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German distilbert_base_german_cased_cimt_argument_type_pipeline pipeline DistilBertForSequenceClassification from juliaromberg +author: John Snow Labs +name: distilbert_base_german_cased_cimt_argument_type_pipeline +date: 2024-09-11 +tags: [de, open_source, pipeline, onnx] +task: Text Classification +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_german_cased_cimt_argument_type_pipeline` is a German model originally trained by juliaromberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_german_cased_cimt_argument_type_pipeline_de_5.5.0_3.0_1726014282233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_german_cased_cimt_argument_type_pipeline_de_5.5.0_3.0_1726014282233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_german_cased_cimt_argument_type_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_german_cased_cimt_argument_type_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_german_cased_cimt_argument_type_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|252.5 MB| + +## References + +https://huggingface.co/juliaromberg/distilbert-base-german-cased_cimt-argument-type + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_custom_dataset_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_custom_dataset_en.md new file mode 100644 index 00000000000000..4d9d939229e6c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_custom_dataset_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_custom_dataset DistilBertForSequenceClassification from pavithrav +author: John Snow Labs +name: distilbert_base_uncased_finetuned_custom_dataset +date: 2024-09-11 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_custom_dataset` is a English model originally trained by pavithrav. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_custom_dataset_en_5.5.0_3.0_1726028812357.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_custom_dataset_en_5.5.0_3.0_1726028812357.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_custom_dataset","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_custom_dataset","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_custom_dataset| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +References + +https://huggingface.co/pavithrav/distilbert-base-uncased-finetuned-custom-dataset \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_emotion_shunjiang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_emotion_shunjiang_pipeline_en.md new file mode 100644 index 00000000000000..1c6e831aea3bb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_emotion_shunjiang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_shunjiang_pipeline pipeline DistilBertForSequenceClassification from shunjiang +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_shunjiang_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_shunjiang_pipeline` is a English model originally trained by shunjiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_shunjiang_pipeline_en_5.5.0_3.0_1726017525509.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_shunjiang_pipeline_en_5.5.0_3.0_1726017525509.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_shunjiang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_shunjiang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_shunjiang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/shunjiang/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_mrpc_carlos0101_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_mrpc_carlos0101_pipeline_en.md new file mode 100644 index 00000000000000..07328070722c89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_mrpc_carlos0101_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_mrpc_carlos0101_pipeline pipeline DistilBertForSequenceClassification from Carlos0101 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_mrpc_carlos0101_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_mrpc_carlos0101_pipeline` is a English model originally trained by Carlos0101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mrpc_carlos0101_pipeline_en_5.5.0_3.0_1726014938891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mrpc_carlos0101_pipeline_en_5.5.0_3.0_1726014938891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_mrpc_carlos0101_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_mrpc_carlos0101_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_mrpc_carlos0101_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Carlos0101/distilbert-base-uncased-finetuned-mrpc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_squad_ajika2102_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_squad_ajika2102_en.md new file mode 100644 index 00000000000000..6a7a9a0098a994 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_squad_ajika2102_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_ajika2102 DistilBertForQuestionAnswering from Ajika2102 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_ajika2102 +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_ajika2102` is a English model originally trained by Ajika2102. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_ajika2102_en_5.5.0_3.0_1726028705077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_ajika2102_en_5.5.0_3.0_1726028705077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_ajika2102","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_ajika2102", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_ajika2102| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Ajika2102/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_squad_dshitkar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_squad_dshitkar_pipeline_en.md new file mode 100644 index 00000000000000..032ea49ad28f38 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_squad_dshitkar_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_dshitkar_pipeline pipeline DistilBertForQuestionAnswering from dshitkar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_dshitkar_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_dshitkar_pipeline` is a English model originally trained by dshitkar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_dshitkar_pipeline_en_5.5.0_3.0_1726028853893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_dshitkar_pipeline_en_5.5.0_3.0_1726028853893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_dshitkar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_dshitkar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_dshitkar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/dshitkar/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_squad_test2_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_squad_test2_en.md new file mode 100644 index 00000000000000..2ff986b7c8b5c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_squad_test2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_test2 DistilBertForQuestionAnswering from allistair99 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_test2 +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_test2` is a English model originally trained by allistair99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_test2_en_5.5.0_3.0_1726028853878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_test2_en_5.5.0_3.0_1726028853878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_test2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_test2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_test2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/allistair99/distilbert-base-uncased-finetuned-squad-test2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut5_plprefix0stlarge11_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut5_plprefix0stlarge11_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..7528450a55ff69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut5_plprefix0stlarge11_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st11sd_ut72ut5_plprefix0stlarge11_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st11sd_ut72ut5_plprefix0stlarge11_simsp100_clean100_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st11sd_ut72ut5_plprefix0stlarge11_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut5_plprefix0stlarge11_simsp100_clean100_pipeline_en_5.5.0_3.0_1726017587075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut5_plprefix0stlarge11_simsp100_clean100_pipeline_en_5.5.0_3.0_1726017587075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut5_plprefix0stlarge11_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut5_plprefix0stlarge11_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st11sd_ut72ut5_plprefix0stlarge11_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st11sd_ut72ut5_PLPrefix0stlarge11_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_finetuned_squadv2_naruto123321_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_finetuned_squadv2_naruto123321_pipeline_en.md new file mode 100644 index 00000000000000..14967dde2ae7aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_finetuned_squadv2_naruto123321_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_finetuned_squadv2_naruto123321_pipeline pipeline DistilBertForQuestionAnswering from Naruto123321 +author: John Snow Labs +name: distilbert_finetuned_squadv2_naruto123321_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squadv2_naruto123321_pipeline` is a English model originally trained by Naruto123321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_naruto123321_pipeline_en_5.5.0_3.0_1726028568338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_naruto123321_pipeline_en_5.5.0_3.0_1726028568338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_squadv2_naruto123321_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_squadv2_naruto123321_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squadv2_naruto123321_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Naruto123321/distilbert-finetuned-squadv2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_sarcascm_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_sarcascm_classifier_en.md new file mode 100644 index 00000000000000..4b88ee957be795 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_sarcascm_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sarcascm_classifier DistilBertForSequenceClassification from brianellis1997 +author: John Snow Labs +name: distilbert_sarcascm_classifier +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sarcascm_classifier` is a English model originally trained by brianellis1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sarcascm_classifier_en_5.5.0_3.0_1726014732943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sarcascm_classifier_en_5.5.0_3.0_1726014732943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sarcascm_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sarcascm_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sarcascm_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/brianellis1997/distilbert-sarcascm-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_uhuy_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_uhuy_en.md new file mode 100644 index 00000000000000..06838837243791 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_uhuy_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_uhuy DistilBertForQuestionAnswering from tyavika +author: John Snow Labs +name: distilbert_uhuy +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_uhuy` is a English model originally trained by tyavika. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_uhuy_en_5.5.0_3.0_1726016586518.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_uhuy_en_5.5.0_3.0_1726016586518.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_uhuy","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_uhuy", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_uhuy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/tyavika/DistilBERT-uhuy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbertforsequenceclassification_6h_768dim_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbertforsequenceclassification_6h_768dim_pipeline_en.md new file mode 100644 index 00000000000000..44ef436ba5439e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbertforsequenceclassification_6h_768dim_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbertforsequenceclassification_6h_768dim_pipeline pipeline DistilBertForSequenceClassification from lxs1 +author: John Snow Labs +name: distilbertforsequenceclassification_6h_768dim_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbertforsequenceclassification_6h_768dim_pipeline` is a English model originally trained by lxs1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbertforsequenceclassification_6h_768dim_pipeline_en_5.5.0_3.0_1726052423087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbertforsequenceclassification_6h_768dim_pipeline_en_5.5.0_3.0_1726052423087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbertforsequenceclassification_6h_768dim_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbertforsequenceclassification_6h_768dim_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbertforsequenceclassification_6h_768dim_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|250.3 MB| + +## References + +https://huggingface.co/lxs1/DistilBertForSequenceClassification_6h_768dim + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilroberta_base_etc_nlp_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilroberta_base_etc_nlp_en.md new file mode 100644 index 00000000000000..0399f6c29bf56f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilroberta_base_etc_nlp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_etc_nlp RoBertaForSequenceClassification from agi-css +author: John Snow Labs +name: distilroberta_base_etc_nlp +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_etc_nlp` is a English model originally trained by agi-css. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_etc_nlp_en_5.5.0_3.0_1726053483725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_etc_nlp_en_5.5.0_3.0_1726053483725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_etc_nlp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_etc_nlp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_etc_nlp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.8 MB| + +## References + +https://huggingface.co/agi-css/distilroberta-base-etc-nlp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-dummy_es.md b/docs/_posts/ahmedlone127/2024-09-11-dummy_es.md new file mode 100644 index 00000000000000..d5009a708ffabd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-dummy_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish dummy MarianTransformer from DTorregrosa +author: John Snow Labs +name: dummy +date: 2024-09-11 +tags: [es, open_source, onnx, translation, marian] +task: Translation +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy` is a Castilian, Spanish model originally trained by DTorregrosa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_es_5.5.0_3.0_1726049394516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_es_5.5.0_3.0_1726049394516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("dummy","es") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("dummy","es") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|es| +|Size:|539.4 MB| + +## References + +https://huggingface.co/DTorregrosa/dummy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-dynasent_roberta_text_disagreement_predictor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-dynasent_roberta_text_disagreement_predictor_pipeline_en.md new file mode 100644 index 00000000000000..11d7ac25e92612 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-dynasent_roberta_text_disagreement_predictor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dynasent_roberta_text_disagreement_predictor_pipeline pipeline RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: dynasent_roberta_text_disagreement_predictor_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dynasent_roberta_text_disagreement_predictor_pipeline` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dynasent_roberta_text_disagreement_predictor_pipeline_en_5.5.0_3.0_1726095837080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dynasent_roberta_text_disagreement_predictor_pipeline_en_5.5.0_3.0_1726095837080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dynasent_roberta_text_disagreement_predictor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dynasent_roberta_text_disagreement_predictor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dynasent_roberta_text_disagreement_predictor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|446.0 MB| + +## References + +https://huggingface.co/RuyuanWan/Dynasent_RoBERTa_Text_Disagreement_Predictor + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-fine_tune_spatial_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-fine_tune_spatial_pipeline_en.md new file mode 100644 index 00000000000000..d367b594d12317 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-fine_tune_spatial_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English fine_tune_spatial_pipeline pipeline RoBertaForQuestionAnswering from dflcmu +author: John Snow Labs +name: fine_tune_spatial_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tune_spatial_pipeline` is a English model originally trained by dflcmu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tune_spatial_pipeline_en_5.5.0_3.0_1726062229745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tune_spatial_pipeline_en_5.5.0_3.0_1726062229745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tune_spatial_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tune_spatial_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tune_spatial_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.5 MB| + +## References + +https://huggingface.co/dflcmu/fine_tune_spatial + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-fine_tuned_twitter_roberta_base_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-11-fine_tuned_twitter_roberta_base_sentiment_en.md new file mode 100644 index 00000000000000..89d7d0f7efb937 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-fine_tuned_twitter_roberta_base_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuned_twitter_roberta_base_sentiment RoBertaForSequenceClassification from bright1 +author: John Snow Labs +name: fine_tuned_twitter_roberta_base_sentiment +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_twitter_roberta_base_sentiment` is a English model originally trained by bright1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_twitter_roberta_base_sentiment_en_5.5.0_3.0_1726060588431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_twitter_roberta_base_sentiment_en_5.5.0_3.0_1726060588431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("fine_tuned_twitter_roberta_base_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("fine_tuned_twitter_roberta_base_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_twitter_roberta_base_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/bright1/fine-tuned-twitter-Roberta-base-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-finetuned_hindi_tonga_tonga_islands_english_v8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-finetuned_hindi_tonga_tonga_islands_english_v8_pipeline_en.md new file mode 100644 index 00000000000000..b9f25114b8c6fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-finetuned_hindi_tonga_tonga_islands_english_v8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_hindi_tonga_tonga_islands_english_v8_pipeline pipeline MarianTransformer from TestZee +author: John Snow Labs +name: finetuned_hindi_tonga_tonga_islands_english_v8_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_hindi_tonga_tonga_islands_english_v8_pipeline` is a English model originally trained by TestZee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_hindi_tonga_tonga_islands_english_v8_pipeline_en_5.5.0_3.0_1726047246739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_hindi_tonga_tonga_islands_english_v8_pipeline_en_5.5.0_3.0_1726047246739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_hindi_tonga_tonga_islands_english_v8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_hindi_tonga_tonga_islands_english_v8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_hindi_tonga_tonga_islands_english_v8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|519.0 MB| + +## References + +https://huggingface.co/TestZee/FineTuned-hindi-to-english-V8 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-finetuning_model_testing_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-finetuning_model_testing_pipeline_en.md new file mode 100644 index 00000000000000..22e81c0ea6ec31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-finetuning_model_testing_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_model_testing_pipeline pipeline RoBertaForSequenceClassification from sadiiipc +author: John Snow Labs +name: finetuning_model_testing_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_model_testing_pipeline` is a English model originally trained by sadiiipc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_model_testing_pipeline_en_5.5.0_3.0_1726063612937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_model_testing_pipeline_en_5.5.0_3.0_1726063612937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_model_testing_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_model_testing_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_model_testing_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|451.8 MB| + +## References + +https://huggingface.co/sadiiipc/finetuning-model-testing + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-focusing_questions_en.md b/docs/_posts/ahmedlone127/2024-09-11-focusing_questions_en.md new file mode 100644 index 00000000000000..fe32dc7713e206 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-focusing_questions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English focusing_questions BertForSequenceClassification from stanford-nlpxed +author: John Snow Labs +name: focusing_questions +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`focusing_questions` is a English model originally trained by stanford-nlpxed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/focusing_questions_en_5.5.0_3.0_1726095242227.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/focusing_questions_en_5.5.0_3.0_1726095242227.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("focusing_questions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("focusing_questions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|focusing_questions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/stanford-nlpxed/focusing-questions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-fre_med_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-fre_med_pipeline_en.md new file mode 100644 index 00000000000000..834e9cf41259d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-fre_med_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fre_med_pipeline pipeline MarianTransformer from rickySaka +author: John Snow Labs +name: fre_med_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fre_med_pipeline` is a English model originally trained by rickySaka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fre_med_pipeline_en_5.5.0_3.0_1726073868955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fre_med_pipeline_en_5.5.0_3.0_1726073868955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fre_med_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fre_med_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fre_med_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.7 MB| + +## References + +https://huggingface.co/rickySaka/fre-med + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-gal_ner_xlmr_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-gal_ner_xlmr_6_pipeline_en.md new file mode 100644 index 00000000000000..61ff0c9de6b3a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-gal_ner_xlmr_6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gal_ner_xlmr_6_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_ner_xlmr_6_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ner_xlmr_6_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ner_xlmr_6_pipeline_en_5.5.0_3.0_1726068688155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ner_xlmr_6_pipeline_en_5.5.0_3.0_1726068688155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_ner_xlmr_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_ner_xlmr_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ner_xlmr_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|808.6 MB| + +## References + +https://huggingface.co/homersimpson/gal-ner-xlmr-6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-gpu1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-gpu1_pipeline_en.md new file mode 100644 index 00000000000000..3efc3a43cc0b83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-gpu1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English gpu1_pipeline pipeline MPNetEmbeddings from Watwat100 +author: John Snow Labs +name: gpu1_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gpu1_pipeline` is a English model originally trained by Watwat100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gpu1_pipeline_en_5.5.0_3.0_1726092828017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gpu1_pipeline_en_5.5.0_3.0_1726092828017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gpu1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gpu1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gpu1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/Watwat100/gpu1 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-gpu2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-gpu2_pipeline_en.md new file mode 100644 index 00000000000000..194e34ee3bcf70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-gpu2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English gpu2_pipeline pipeline MPNetEmbeddings from Watwat100 +author: John Snow Labs +name: gpu2_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gpu2_pipeline` is a English model originally trained by Watwat100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gpu2_pipeline_en_5.5.0_3.0_1726092833508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gpu2_pipeline_en_5.5.0_3.0_1726092833508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gpu2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gpu2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gpu2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/Watwat100/gpu2 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-grappa_large_jnt_en.md b/docs/_posts/ahmedlone127/2024-09-11-grappa_large_jnt_en.md new file mode 100644 index 00000000000000..2653f9534a9d7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-grappa_large_jnt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English grappa_large_jnt RoBertaEmbeddings from Salesforce +author: John Snow Labs +name: grappa_large_jnt +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`grappa_large_jnt` is a English model originally trained by Salesforce. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/grappa_large_jnt_en_5.5.0_3.0_1726066083574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/grappa_large_jnt_en_5.5.0_3.0_1726066083574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("grappa_large_jnt","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("grappa_large_jnt","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|grappa_large_jnt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Salesforce/grappa_large_jnt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-guwen_all_mpnet_base_v2_nepal_bhasa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-guwen_all_mpnet_base_v2_nepal_bhasa_pipeline_en.md new file mode 100644 index 00000000000000..2846cb51d8c560 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-guwen_all_mpnet_base_v2_nepal_bhasa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English guwen_all_mpnet_base_v2_nepal_bhasa_pipeline pipeline MPNetEmbeddings from RinaChen +author: John Snow Labs +name: guwen_all_mpnet_base_v2_nepal_bhasa_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`guwen_all_mpnet_base_v2_nepal_bhasa_pipeline` is a English model originally trained by RinaChen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/guwen_all_mpnet_base_v2_nepal_bhasa_pipeline_en_5.5.0_3.0_1726089150202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/guwen_all_mpnet_base_v2_nepal_bhasa_pipeline_en_5.5.0_3.0_1726089150202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("guwen_all_mpnet_base_v2_nepal_bhasa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("guwen_all_mpnet_base_v2_nepal_bhasa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|guwen_all_mpnet_base_v2_nepal_bhasa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/RinaChen/GuWen-all-mpnet-base-v2-new + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-helsinki_danish_swedish_v16_en.md b/docs/_posts/ahmedlone127/2024-09-11-helsinki_danish_swedish_v16_en.md new file mode 100644 index 00000000000000..478e9c7e0c7b23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-helsinki_danish_swedish_v16_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English helsinki_danish_swedish_v16 MarianTransformer from Danieljacobsen +author: John Snow Labs +name: helsinki_danish_swedish_v16 +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helsinki_danish_swedish_v16` is a English model originally trained by Danieljacobsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helsinki_danish_swedish_v16_en_5.5.0_3.0_1726038016933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helsinki_danish_swedish_v16_en_5.5.0_3.0_1726038016933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("helsinki_danish_swedish_v16","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("helsinki_danish_swedish_v16","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helsinki_danish_swedish_v16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|496.8 MB| + +## References + +https://huggingface.co/Danieljacobsen/Helsinki-DA-SV-v16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-indobertweet_base_indonesian_sentiment_analysis_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-11-indobertweet_base_indonesian_sentiment_analysis_pipeline_id.md new file mode 100644 index 00000000000000..260fb5f6042454 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-indobertweet_base_indonesian_sentiment_analysis_pipeline_id.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Indonesian indobertweet_base_indonesian_sentiment_analysis_pipeline pipeline BertForSequenceClassification from Aardiiiiy +author: John Snow Labs +name: indobertweet_base_indonesian_sentiment_analysis_pipeline +date: 2024-09-11 +tags: [id, open_source, pipeline, onnx] +task: Text Classification +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indobertweet_base_indonesian_sentiment_analysis_pipeline` is a Indonesian model originally trained by Aardiiiiy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indobertweet_base_indonesian_sentiment_analysis_pipeline_id_5.5.0_3.0_1726015170250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indobertweet_base_indonesian_sentiment_analysis_pipeline_id_5.5.0_3.0_1726015170250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("indobertweet_base_indonesian_sentiment_analysis_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("indobertweet_base_indonesian_sentiment_analysis_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indobertweet_base_indonesian_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|414.1 MB| + +## References + +https://huggingface.co/Aardiiiiy/indobertweet-base-Indonesian-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-ironial52_roberta_es.md b/docs/_posts/ahmedlone127/2024-09-11-ironial52_roberta_es.md new file mode 100644 index 00000000000000..3b373f659c047a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-ironial52_roberta_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish ironial52_roberta RoBertaForSequenceClassification from l52mas +author: John Snow Labs +name: ironial52_roberta +date: 2024-09-11 +tags: [es, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ironial52_roberta` is a Castilian, Spanish model originally trained by l52mas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ironial52_roberta_es_5.5.0_3.0_1726082588188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ironial52_roberta_es_5.5.0_3.0_1726082588188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ironial52_roberta","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ironial52_roberta", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ironial52_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|es| +|Size:|468.2 MB| + +## References + +https://huggingface.co/l52mas/ironiaL52_roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-klue_bert_base_sentiment_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-11-klue_bert_base_sentiment_pipeline_ko.md new file mode 100644 index 00000000000000..5983f7fe7d69a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-klue_bert_base_sentiment_pipeline_ko.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Korean klue_bert_base_sentiment_pipeline pipeline BertForSequenceClassification from hun3359 +author: John Snow Labs +name: klue_bert_base_sentiment_pipeline +date: 2024-09-11 +tags: [ko, open_source, pipeline, onnx] +task: Text Classification +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`klue_bert_base_sentiment_pipeline` is a Korean model originally trained by hun3359. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/klue_bert_base_sentiment_pipeline_ko_5.5.0_3.0_1726059564558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/klue_bert_base_sentiment_pipeline_ko_5.5.0_3.0_1726059564558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("klue_bert_base_sentiment_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("klue_bert_base_sentiment_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|klue_bert_base_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|414.8 MB| + +## References + +https://huggingface.co/hun3359/klue-bert-base-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-lab1_finetuning_mon_en.md b/docs/_posts/ahmedlone127/2024-09-11-lab1_finetuning_mon_en.md new file mode 100644 index 00000000000000..bc1e3618291ed1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-lab1_finetuning_mon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lab1_finetuning_mon MarianTransformer from Viennes +author: John Snow Labs +name: lab1_finetuning_mon +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_finetuning_mon` is a English model originally trained by Viennes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_finetuning_mon_en_5.5.0_3.0_1726073802567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_finetuning_mon_en_5.5.0_3.0_1726073802567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("lab1_finetuning_mon","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("lab1_finetuning_mon","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_finetuning_mon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/Viennes/lab1_finetuning_mon \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-lthien_bislama_ep_tra_bai_corsican_phuong_pipeline_nan.md b/docs/_posts/ahmedlone127/2024-09-11-lthien_bislama_ep_tra_bai_corsican_phuong_pipeline_nan.md new file mode 100644 index 00000000000000..a8b6607728800e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-lthien_bislama_ep_tra_bai_corsican_phuong_pipeline_nan.md @@ -0,0 +1,69 @@ +--- +layout: model +title: None lthien_bislama_ep_tra_bai_corsican_phuong_pipeline pipeline DistilBertForQuestionAnswering from hi113 +author: John Snow Labs +name: lthien_bislama_ep_tra_bai_corsican_phuong_pipeline +date: 2024-09-11 +tags: [nan, open_source, pipeline, onnx] +task: Question Answering +language: nan +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lthien_bislama_ep_tra_bai_corsican_phuong_pipeline` is a None model originally trained by hi113. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lthien_bislama_ep_tra_bai_corsican_phuong_pipeline_nan_5.5.0_3.0_1726088239959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lthien_bislama_ep_tra_bai_corsican_phuong_pipeline_nan_5.5.0_3.0_1726088239959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lthien_bislama_ep_tra_bai_corsican_phuong_pipeline", lang = "nan") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lthien_bislama_ep_tra_bai_corsican_phuong_pipeline", lang = "nan") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lthien_bislama_ep_tra_bai_corsican_phuong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nan| +|Size:|247.2 MB| + +## References + +https://huggingface.co/hi113/ltHien_Bi_Ep_Tra_Bai_Co_Phuong + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-mi_tinyroberta_effect_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-mi_tinyroberta_effect_pipeline_en.md new file mode 100644 index 00000000000000..920e9b65ee88e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-mi_tinyroberta_effect_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mi_tinyroberta_effect_pipeline pipeline RoBertaForQuestionAnswering from Juncodh +author: John Snow Labs +name: mi_tinyroberta_effect_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mi_tinyroberta_effect_pipeline` is a English model originally trained by Juncodh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mi_tinyroberta_effect_pipeline_en_5.5.0_3.0_1726036359216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mi_tinyroberta_effect_pipeline_en_5.5.0_3.0_1726036359216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mi_tinyroberta_effect_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mi_tinyroberta_effect_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mi_tinyroberta_effect_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.2 MB| + +## References + +https://huggingface.co/Juncodh/mi_tinyROBERTA_effect + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-model_opus_maltese_english_multiple_languages_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-11-model_opus_maltese_english_multiple_languages_finetuned_en.md new file mode 100644 index 00000000000000..9ae8608769a394 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-model_opus_maltese_english_multiple_languages_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_opus_maltese_english_multiple_languages_finetuned MarianTransformer from Miguelcj1 +author: John Snow Labs +name: model_opus_maltese_english_multiple_languages_finetuned +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_opus_maltese_english_multiple_languages_finetuned` is a English model originally trained by Miguelcj1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_opus_maltese_english_multiple_languages_finetuned_en_5.5.0_3.0_1726037391629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_opus_maltese_english_multiple_languages_finetuned_en_5.5.0_3.0_1726037391629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("model_opus_maltese_english_multiple_languages_finetuned","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("model_opus_maltese_english_multiple_languages_finetuned","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_opus_maltese_english_multiple_languages_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|530.4 MB| + +## References + +https://huggingface.co/Miguelcj1/model_opus-mt-en-mul-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-modelolongformerbecas_en.md b/docs/_posts/ahmedlone127/2024-09-11-modelolongformerbecas_en.md new file mode 100644 index 00000000000000..474b2fee0dc60f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-modelolongformerbecas_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English modelolongformerbecas RoBertaForQuestionAnswering from jonasaid +author: John Snow Labs +name: modelolongformerbecas +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modelolongformerbecas` is a English model originally trained by jonasaid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modelolongformerbecas_en_5.5.0_3.0_1726058067802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modelolongformerbecas_en_5.5.0_3.0_1726058067802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("modelolongformerbecas","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("modelolongformerbecas", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modelolongformerbecas| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|473.3 MB| + +## References + +https://huggingface.co/jonasaid/modeloLongformerBecas \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-multi_qa_mpnet_base_dot_v1_covidqa_search_75_25_2epoch_full_en.md b/docs/_posts/ahmedlone127/2024-09-11-multi_qa_mpnet_base_dot_v1_covidqa_search_75_25_2epoch_full_en.md new file mode 100644 index 00000000000000..314aaf7c53f5fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-multi_qa_mpnet_base_dot_v1_covidqa_search_75_25_2epoch_full_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English multi_qa_mpnet_base_dot_v1_covidqa_search_75_25_2epoch_full MPNetEmbeddings from checkiejan +author: John Snow Labs +name: multi_qa_mpnet_base_dot_v1_covidqa_search_75_25_2epoch_full +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multi_qa_mpnet_base_dot_v1_covidqa_search_75_25_2epoch_full` is a English model originally trained by checkiejan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multi_qa_mpnet_base_dot_v1_covidqa_search_75_25_2epoch_full_en_5.5.0_3.0_1726088741430.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multi_qa_mpnet_base_dot_v1_covidqa_search_75_25_2epoch_full_en_5.5.0_3.0_1726088741430.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("multi_qa_mpnet_base_dot_v1_covidqa_search_75_25_2epoch_full","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("multi_qa_mpnet_base_dot_v1_covidqa_search_75_25_2epoch_full","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multi_qa_mpnet_base_dot_v1_covidqa_search_75_25_2epoch_full| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/checkiejan/multi-qa-mpnet-base-dot-v1-covidqa-search-75-25-2epoch-full \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-ner_multilingual_bert_xx.md b/docs/_posts/ahmedlone127/2024-09-11-ner_multilingual_bert_xx.md new file mode 100644 index 00000000000000..99ed7e8ee52c9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-ner_multilingual_bert_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual ner_multilingual_bert BertForTokenClassification from textminr +author: John Snow Labs +name: ner_multilingual_bert +date: 2024-09-11 +tags: [xx, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_multilingual_bert` is a Multilingual model originally trained by textminr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_multilingual_bert_xx_5.5.0_3.0_1726026999829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_multilingual_bert_xx_5.5.0_3.0_1726026999829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_multilingual_bert","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_multilingual_bert", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_multilingual_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/textminr/ner-multilingual-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-nli_roberta_base_finetuned_for_amazon_review_ratings_ahkrey_en.md b/docs/_posts/ahmedlone127/2024-09-11-nli_roberta_base_finetuned_for_amazon_review_ratings_ahkrey_en.md new file mode 100644 index 00000000000000..1944aed12c3825 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-nli_roberta_base_finetuned_for_amazon_review_ratings_ahkrey_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nli_roberta_base_finetuned_for_amazon_review_ratings_ahkrey RoBertaForSequenceClassification from ahkrey +author: John Snow Labs +name: nli_roberta_base_finetuned_for_amazon_review_ratings_ahkrey +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nli_roberta_base_finetuned_for_amazon_review_ratings_ahkrey` is a English model originally trained by ahkrey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nli_roberta_base_finetuned_for_amazon_review_ratings_ahkrey_en_5.5.0_3.0_1726097055725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nli_roberta_base_finetuned_for_amazon_review_ratings_ahkrey_en_5.5.0_3.0_1726097055725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nli_roberta_base_finetuned_for_amazon_review_ratings_ahkrey","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nli_roberta_base_finetuned_for_amazon_review_ratings_ahkrey", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nli_roberta_base_finetuned_for_amazon_review_ratings_ahkrey| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/ahkrey/nli-roberta-base-finetuned-for-amazon-review-ratings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_big_enfr_ft_wang_2022_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_big_enfr_ft_wang_2022_en.md new file mode 100644 index 00000000000000..ab4803e97980e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_big_enfr_ft_wang_2022_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_big_enfr_ft_wang_2022 MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_big_enfr_ft_wang_2022 +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_big_enfr_ft_wang_2022` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_big_enfr_ft_wang_2022_en_5.5.0_3.0_1726037558592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_big_enfr_ft_wang_2022_en_5.5.0_3.0_1726037558592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_big_enfr_ft_wang_2022","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_big_enfr_ft_wang_2022","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_big_enfr_ft_wang_2022| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ethansimrm/opus_big_enfr_FT_wang_2022 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_big_lsp_aon_wce_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_big_lsp_aon_wce_pipeline_en.md new file mode 100644 index 00000000000000..f13d9c30d6d7dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_big_lsp_aon_wce_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_big_lsp_aon_wce_pipeline pipeline MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_big_lsp_aon_wce_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_big_lsp_aon_wce_pipeline` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_big_lsp_aon_wce_pipeline_en_5.5.0_3.0_1726049236175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_big_lsp_aon_wce_pipeline_en_5.5.0_3.0_1726049236175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_big_lsp_aon_wce_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_big_lsp_aon_wce_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_big_lsp_aon_wce_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ethansimrm/opus_big_lsp_AoN_wce + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_chinese_english_model_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_chinese_english_model_pipeline_zh.md new file mode 100644 index 00000000000000..0a309e36d554ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_chinese_english_model_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese opus_maltese_chinese_english_model_pipeline pipeline MarianTransformer from Varine +author: John Snow Labs +name: opus_maltese_chinese_english_model_pipeline +date: 2024-09-11 +tags: [zh, open_source, pipeline, onnx] +task: Translation +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_chinese_english_model_pipeline` is a Chinese model originally trained by Varine. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_chinese_english_model_pipeline_zh_5.5.0_3.0_1726073868429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_chinese_english_model_pipeline_zh_5.5.0_3.0_1726073868429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_chinese_english_model_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_chinese_english_model_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_chinese_english_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|539.8 MB| + +## References + +https://huggingface.co/Varine/opus-mt-zh-en-model + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_arabic_finetuned_dummydata_10_10_arabic_tonga_tonga_islands_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_arabic_finetuned_dummydata_10_10_arabic_tonga_tonga_islands_english_pipeline_en.md new file mode 100644 index 00000000000000..aaa87821cd2751 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_arabic_finetuned_dummydata_10_10_arabic_tonga_tonga_islands_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_arabic_finetuned_dummydata_10_10_arabic_tonga_tonga_islands_english_pipeline pipeline MarianTransformer from MaryaAI +author: John Snow Labs +name: opus_maltese_english_arabic_finetuned_dummydata_10_10_arabic_tonga_tonga_islands_english_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_arabic_finetuned_dummydata_10_10_arabic_tonga_tonga_islands_english_pipeline` is a English model originally trained by MaryaAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_arabic_finetuned_dummydata_10_10_arabic_tonga_tonga_islands_english_pipeline_en_5.5.0_3.0_1726074042180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_arabic_finetuned_dummydata_10_10_arabic_tonga_tonga_islands_english_pipeline_en_5.5.0_3.0_1726074042180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_arabic_finetuned_dummydata_10_10_arabic_tonga_tonga_islands_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_arabic_finetuned_dummydata_10_10_arabic_tonga_tonga_islands_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_arabic_finetuned_dummydata_10_10_arabic_tonga_tonga_islands_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|528.8 MB| + +## References + +https://huggingface.co/MaryaAI/opus-mt-en-ar-finetuned-dummyData-10-10-ar-to-en + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_bkm_final_60_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_bkm_final_60_pipeline_en.md new file mode 100644 index 00000000000000..9c5c3a8cd3cef4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_bkm_final_60_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_bkm_final_60_pipeline pipeline MarianTransformer from kalese +author: John Snow Labs +name: opus_maltese_english_bkm_final_60_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_bkm_final_60_pipeline` is a English model originally trained by kalese. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_bkm_final_60_pipeline_en_5.5.0_3.0_1726047504432.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_bkm_final_60_pipeline_en_5.5.0_3.0_1726047504432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_bkm_final_60_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_bkm_final_60_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_bkm_final_60_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.9 MB| + +## References + +https://huggingface.co/kalese/opus-mt-en-bkm-Final-60 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_redpandaainlp_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_redpandaainlp_en.md new file mode 100644 index 00000000000000..b220a863a22f6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_redpandaainlp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_redpandaainlp MarianTransformer from RedPandaAINLP +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_redpandaainlp +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_redpandaainlp` is a English model originally trained by RedPandaAINLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_redpandaainlp_en_5.5.0_3.0_1726038283308.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_redpandaainlp_en_5.5.0_3.0_1726038283308.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_redpandaainlp","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_redpandaainlp","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_redpandaainlp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/RedPandaAINLP/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_rohan12_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_rohan12_en.md new file mode 100644 index 00000000000000..790f384ebaa2ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_rohan12_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_rohan12 MarianTransformer from rohan12 +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_rohan12 +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_rohan12` is a English model originally trained by rohan12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_rohan12_en_5.5.0_3.0_1726049986655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_rohan12_en_5.5.0_3.0_1726049986655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_rohan12","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_rohan12","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_rohan12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/rohan12/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_slimamel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_slimamel_pipeline_en.md new file mode 100644 index 00000000000000..b6ae11a632bf66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_slimamel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_slimamel_pipeline pipeline MarianTransformer from slimamel +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_slimamel_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_slimamel_pipeline` is a English model originally trained by slimamel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_slimamel_pipeline_en_5.5.0_3.0_1726047390106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_slimamel_pipeline_en_5.5.0_3.0_1726047390106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_slimamel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_slimamel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_slimamel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.1 MB| + +## References + +https://huggingface.co/slimamel/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_romanian_finetuned_german_tonga_tonga_islands_english_indra3199_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_romanian_finetuned_german_tonga_tonga_islands_english_indra3199_pipeline_en.md new file mode 100644 index 00000000000000..0cd75743946c6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_romanian_finetuned_german_tonga_tonga_islands_english_indra3199_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_german_tonga_tonga_islands_english_indra3199_pipeline pipeline MarianTransformer from indra3199 +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_german_tonga_tonga_islands_english_indra3199_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_german_tonga_tonga_islands_english_indra3199_pipeline` is a English model originally trained by indra3199. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_german_tonga_tonga_islands_english_indra3199_pipeline_en_5.5.0_3.0_1726049855367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_german_tonga_tonga_islands_english_indra3199_pipeline_en_5.5.0_3.0_1726049855367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_german_tonga_tonga_islands_english_indra3199_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_german_tonga_tonga_islands_english_indra3199_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_german_tonga_tonga_islands_english_indra3199_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.2 MB| + +## References + +https://huggingface.co/indra3199/opus-mt-en-ro-finetuned-de-to-en + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_iaanimashaun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_iaanimashaun_pipeline_en.md new file mode 100644 index 00000000000000..13ac1470ec2294 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_iaanimashaun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_iaanimashaun_pipeline pipeline MarianTransformer from iaanimashaun +author: John Snow Labs +name: opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_iaanimashaun_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_iaanimashaun_pipeline` is a English model originally trained by iaanimashaun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_iaanimashaun_pipeline_en_5.5.0_3.0_1726037836032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_iaanimashaun_pipeline_en_5.5.0_3.0_1726037836032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_iaanimashaun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_iaanimashaun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_swahili_finetuned_english_tonga_tonga_islands_swahili_iaanimashaun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.0 MB| + +## References + +https://huggingface.co/iaanimashaun/opus-mt-en-sw-finetuned-en-to-sw + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_ganda_english_finetuned_kirundi_tonga_tonga_islands_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_ganda_english_finetuned_kirundi_tonga_tonga_islands_english_pipeline_en.md new file mode 100644 index 00000000000000..f1f4209a16edd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_ganda_english_finetuned_kirundi_tonga_tonga_islands_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_ganda_english_finetuned_kirundi_tonga_tonga_islands_english_pipeline pipeline MarianTransformer from KitoEver +author: John Snow Labs +name: opus_maltese_ganda_english_finetuned_kirundi_tonga_tonga_islands_english_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_ganda_english_finetuned_kirundi_tonga_tonga_islands_english_pipeline` is a English model originally trained by KitoEver. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_ganda_english_finetuned_kirundi_tonga_tonga_islands_english_pipeline_en_5.5.0_3.0_1726038656353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_ganda_english_finetuned_kirundi_tonga_tonga_islands_english_pipeline_en_5.5.0_3.0_1726038656353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_ganda_english_finetuned_kirundi_tonga_tonga_islands_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_ganda_english_finetuned_kirundi_tonga_tonga_islands_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_ganda_english_finetuned_kirundi_tonga_tonga_islands_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|512.9 MB| + +## References + +https://huggingface.co/KitoEver/opus-mt-lg-en-finetuned-rn-to-en + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_romance_english_finetuned_npomo_english_10_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_romance_english_finetuned_npomo_english_10_epochs_pipeline_en.md new file mode 100644 index 00000000000000..2409128f4a5165 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_romance_english_finetuned_npomo_english_10_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_romance_english_finetuned_npomo_english_10_epochs_pipeline pipeline MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_romance_english_finetuned_npomo_english_10_epochs_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_romance_english_finetuned_npomo_english_10_epochs_pipeline` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_romance_english_finetuned_npomo_english_10_epochs_pipeline_en_5.5.0_3.0_1726047179811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_romance_english_finetuned_npomo_english_10_epochs_pipeline_en_5.5.0_3.0_1726047179811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_romance_english_finetuned_npomo_english_10_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_romance_english_finetuned_npomo_english_10_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_romance_english_finetuned_npomo_english_10_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|539.5 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-ROMANCE-en-finetuned-npomo-en-10-epochs + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_tatoeba_eng_yor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_tatoeba_eng_yor_pipeline_en.md new file mode 100644 index 00000000000000..1e0bc22ee54870 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_tatoeba_eng_yor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_tatoeba_eng_yor_pipeline pipeline MarianTransformer from omoekan +author: John Snow Labs +name: opus_tatoeba_eng_yor_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_tatoeba_eng_yor_pipeline` is a English model originally trained by omoekan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_tatoeba_eng_yor_pipeline_en_5.5.0_3.0_1726038451398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_tatoeba_eng_yor_pipeline_en_5.5.0_3.0_1726038451398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_tatoeba_eng_yor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_tatoeba_eng_yor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_tatoeba_eng_yor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|186.4 MB| + +## References + +https://huggingface.co/omoekan/opus-tatoeba-eng-yor + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_wmt18_finetune_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_wmt18_finetune_en.md new file mode 100644 index 00000000000000..88f0d8c8c08ba8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_wmt18_finetune_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_wmt18_finetune MarianTransformer from nomsgadded +author: John Snow Labs +name: opus_wmt18_finetune +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_wmt18_finetune` is a English model originally trained by nomsgadded. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_wmt18_finetune_en_5.5.0_3.0_1726038470896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_wmt18_finetune_en_5.5.0_3.0_1726038470896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_wmt18_finetune","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_wmt18_finetune","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_wmt18_finetune| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|541.2 MB| + +## References + +https://huggingface.co/nomsgadded/opus_wmt18_finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-phishing_email_detection_21_07_en.md b/docs/_posts/ahmedlone127/2024-09-11-phishing_email_detection_21_07_en.md new file mode 100644 index 00000000000000..9684a591b1eff5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-phishing_email_detection_21_07_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phishing_email_detection_21_07 RoBertaForSequenceClassification from kamikaze20 +author: John Snow Labs +name: phishing_email_detection_21_07 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phishing_email_detection_21_07` is a English model originally trained by kamikaze20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phishing_email_detection_21_07_en_5.5.0_3.0_1726022539083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phishing_email_detection_21_07_en_5.5.0_3.0_1726022539083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("phishing_email_detection_21_07","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("phishing_email_detection_21_07", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phishing_email_detection_21_07| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|450.8 MB| + +## References + +https://huggingface.co/kamikaze20/phishing-email-detection_21_07 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-praise_feedback_en.md b/docs/_posts/ahmedlone127/2024-09-11-praise_feedback_en.md new file mode 100644 index 00000000000000..a79048c4a191e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-praise_feedback_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English praise_feedback RoBertaForSequenceClassification from meiflwr +author: John Snow Labs +name: praise_feedback +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`praise_feedback` is a English model originally trained by meiflwr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/praise_feedback_en_5.5.0_3.0_1726082460496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/praise_feedback_en_5.5.0_3.0_1726082460496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("praise_feedback","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("praise_feedback", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|praise_feedback| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|421.5 MB| + +## References + +https://huggingface.co/meiflwr/praise_feedback \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-q2e_5_en.md b/docs/_posts/ahmedlone127/2024-09-11-q2e_5_en.md new file mode 100644 index 00000000000000..a07977dc83c2b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-q2e_5_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English q2e_5 MPNetEmbeddings from ingeol +author: John Snow Labs +name: q2e_5 +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`q2e_5` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/q2e_5_en_5.5.0_3.0_1726055036197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/q2e_5_en_5.5.0_3.0_1726055036197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("q2e_5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("q2e_5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|q2e_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/q2e_5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-rbt4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-rbt4_pipeline_en.md new file mode 100644 index 00000000000000..f39b9330a8dcac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-rbt4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English rbt4_pipeline pipeline RoBertaForQuestionAnswering from SUTS102779289 +author: John Snow Labs +name: rbt4_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rbt4_pipeline` is a English model originally trained by SUTS102779289. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rbt4_pipeline_en_5.5.0_3.0_1726055765663.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rbt4_pipeline_en_5.5.0_3.0_1726055765663.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rbt4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rbt4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rbt4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/SUTS102779289/rbt4 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_base_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..a398bdf26fb724 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_pipeline pipeline RoBertaEmbeddings from GusNicho +author: John Snow Labs +name: roberta_base_finetuned_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_pipeline` is a English model originally trained by GusNicho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_pipeline_en_5.5.0_3.0_1726065709755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_pipeline_en_5.5.0_3.0_1726065709755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/GusNicho/roberta-base-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_base_finetuned_vedantgaur_human_generated_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_finetuned_vedantgaur_human_generated_pipeline_en.md new file mode 100644 index 00000000000000..21bbf31342f109 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_finetuned_vedantgaur_human_generated_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_vedantgaur_human_generated_pipeline pipeline RoBertaForSequenceClassification from SkwarczynskiP +author: John Snow Labs +name: roberta_base_finetuned_vedantgaur_human_generated_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_vedantgaur_human_generated_pipeline` is a English model originally trained by SkwarczynskiP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_vedantgaur_human_generated_pipeline_en_5.5.0_3.0_1726061148240.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_vedantgaur_human_generated_pipeline_en_5.5.0_3.0_1726061148240.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_vedantgaur_human_generated_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_vedantgaur_human_generated_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_vedantgaur_human_generated_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|417.1 MB| + +## References + +https://huggingface.co/SkwarczynskiP/roberta-base-finetuned-vedantgaur-human-generated + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_base_prop_16_train_set_scottstots_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_prop_16_train_set_scottstots_pipeline_en.md new file mode 100644 index 00000000000000..346d0ddec7e65e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_prop_16_train_set_scottstots_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_prop_16_train_set_scottstots_pipeline pipeline RoBertaForSequenceClassification from scottstots +author: John Snow Labs +name: roberta_base_prop_16_train_set_scottstots_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_prop_16_train_set_scottstots_pipeline` is a English model originally trained by scottstots. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_prop_16_train_set_scottstots_pipeline_en_5.5.0_3.0_1726089706388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_prop_16_train_set_scottstots_pipeline_en_5.5.0_3.0_1726089706388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_prop_16_train_set_scottstots_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_prop_16_train_set_scottstots_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_prop_16_train_set_scottstots_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|417.4 MB| + +## References + +https://huggingface.co/scottstots/roberta-base-prop-16-train-set + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_base_text_classification_emotion_adamw_epochs_7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_text_classification_emotion_adamw_epochs_7_pipeline_en.md new file mode 100644 index 00000000000000..8e15d9144fa2b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_text_classification_emotion_adamw_epochs_7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_text_classification_emotion_adamw_epochs_7_pipeline pipeline RoBertaForSequenceClassification from galkowskim +author: John Snow Labs +name: roberta_base_text_classification_emotion_adamw_epochs_7_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_text_classification_emotion_adamw_epochs_7_pipeline` is a English model originally trained by galkowskim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_text_classification_emotion_adamw_epochs_7_pipeline_en_5.5.0_3.0_1726061014567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_text_classification_emotion_adamw_epochs_7_pipeline_en_5.5.0_3.0_1726061014567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_text_classification_emotion_adamw_epochs_7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_text_classification_emotion_adamw_epochs_7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_text_classification_emotion_adamw_epochs_7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.5 MB| + +## References + +https://huggingface.co/galkowskim/roberta-base-text-classification-emotion-adamw-epochs-7 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_base_turkish_uncased_turkcell_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_turkish_uncased_turkcell_pipeline_tr.md new file mode 100644 index 00000000000000..9285237f7fd937 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_turkish_uncased_turkcell_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish roberta_base_turkish_uncased_turkcell_pipeline pipeline RoBertaEmbeddings from TURKCELL +author: John Snow Labs +name: roberta_base_turkish_uncased_turkcell_pipeline +date: 2024-09-11 +tags: [tr, open_source, pipeline, onnx] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_turkish_uncased_turkcell_pipeline` is a Turkish model originally trained by TURKCELL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_turkish_uncased_turkcell_pipeline_tr_5.5.0_3.0_1726024253882.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_turkish_uncased_turkcell_pipeline_tr_5.5.0_3.0_1726024253882.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_turkish_uncased_turkcell_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_turkish_uncased_turkcell_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_turkish_uncased_turkcell_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|461.5 MB| + +## References + +https://huggingface.co/TURKCELL/roberta-base-turkish-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_cord19_1m7k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_cord19_1m7k_pipeline_en.md new file mode 100644 index 00000000000000..fa8bb25a5b70a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_cord19_1m7k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_cord19_1m7k_pipeline pipeline RoBertaEmbeddings from amoux +author: John Snow Labs +name: roberta_cord19_1m7k_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cord19_1m7k_pipeline` is a English model originally trained by amoux. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cord19_1m7k_pipeline_en_5.5.0_3.0_1726065986010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cord19_1m7k_pipeline_en_5.5.0_3.0_1726065986010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_cord19_1m7k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_cord19_1m7k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cord19_1m7k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|365.0 MB| + +## References + +https://huggingface.co/amoux/roberta-cord19-1M7k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_mrqa_plus_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_mrqa_plus_en.md new file mode 100644 index 00000000000000..31166d86d0c2e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_mrqa_plus_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_mrqa_plus RoBertaForQuestionAnswering from enriquesaou +author: John Snow Labs +name: roberta_mrqa_plus +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_mrqa_plus` is a English model originally trained by enriquesaou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_mrqa_plus_en_5.5.0_3.0_1726036270186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_mrqa_plus_en_5.5.0_3.0_1726036270186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_mrqa_plus","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_mrqa_plus", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_mrqa_plus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|465.1 MB| + +## References + +https://huggingface.co/enriquesaou/roberta-mrqa-plus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_temporal_predictor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_temporal_predictor_pipeline_en.md new file mode 100644 index 00000000000000..9a5e19c6bc6330 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_temporal_predictor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_temporal_predictor_pipeline pipeline RoBertaEmbeddings from CogComp +author: John Snow Labs +name: roberta_temporal_predictor_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_temporal_predictor_pipeline` is a English model originally trained by CogComp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_temporal_predictor_pipeline_en_5.5.0_3.0_1726094163079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_temporal_predictor_pipeline_en_5.5.0_3.0_1726094163079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_temporal_predictor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_temporal_predictor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_temporal_predictor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/CogComp/roberta-temporal-predictor + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_vmw_mrqa_old_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_vmw_mrqa_old_en.md new file mode 100644 index 00000000000000..d5837da2698c26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_vmw_mrqa_old_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_vmw_mrqa_old RoBertaForQuestionAnswering from enriquesaou +author: John Snow Labs +name: roberta_vmw_mrqa_old +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_vmw_mrqa_old` is a English model originally trained by enriquesaou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_vmw_mrqa_old_en_5.5.0_3.0_1726036756395.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_vmw_mrqa_old_en_5.5.0_3.0_1726036756395.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_vmw_mrqa_old","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_vmw_mrqa_old", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_vmw_mrqa_old| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/enriquesaou/roberta-vmw-mrqa-old \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-romanian_finetuned_winogrande83e_058_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-romanian_finetuned_winogrande83e_058_pipeline_en.md new file mode 100644 index 00000000000000..d2ea8ac2d57f87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-romanian_finetuned_winogrande83e_058_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English romanian_finetuned_winogrande83e_058_pipeline pipeline RoBertaForSequenceClassification from Kalslice +author: John Snow Labs +name: romanian_finetuned_winogrande83e_058_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`romanian_finetuned_winogrande83e_058_pipeline` is a English model originally trained by Kalslice. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/romanian_finetuned_winogrande83e_058_pipeline_en_5.5.0_3.0_1726096333712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/romanian_finetuned_winogrande83e_058_pipeline_en_5.5.0_3.0_1726096333712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("romanian_finetuned_winogrande83e_058_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("romanian_finetuned_winogrande83e_058_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|romanian_finetuned_winogrande83e_058_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|441.0 MB| + +## References + +https://huggingface.co/Kalslice/ro-finetuned-winogrande83e-058 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-rtmex23_pol2_ptl_v5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-rtmex23_pol2_ptl_v5_pipeline_en.md new file mode 100644 index 00000000000000..99648c14f61707 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-rtmex23_pol2_ptl_v5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English rtmex23_pol2_ptl_v5_pipeline pipeline RoBertaForSequenceClassification from javilonso +author: John Snow Labs +name: rtmex23_pol2_ptl_v5_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rtmex23_pol2_ptl_v5_pipeline` is a English model originally trained by javilonso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rtmex23_pol2_ptl_v5_pipeline_en_5.5.0_3.0_1726082531249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rtmex23_pol2_ptl_v5_pipeline_en_5.5.0_3.0_1726082531249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rtmex23_pol2_ptl_v5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rtmex23_pol2_ptl_v5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rtmex23_pol2_ptl_v5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/javilonso/rtmex23-pol2-ptl_v5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-ruspamns_v1_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-11-ruspamns_v1_pipeline_ru.md new file mode 100644 index 00000000000000..fc1765807e12e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-ruspamns_v1_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian ruspamns_v1_pipeline pipeline BertForSequenceClassification from NeuroSpaceX +author: John Snow Labs +name: ruspamns_v1_pipeline +date: 2024-09-11 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruspamns_v1_pipeline` is a Russian model originally trained by NeuroSpaceX. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruspamns_v1_pipeline_ru_5.5.0_3.0_1726015207543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruspamns_v1_pipeline_ru_5.5.0_3.0_1726015207543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ruspamns_v1_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ruspamns_v1_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruspamns_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|109.5 MB| + +## References + +https://huggingface.co/NeuroSpaceX/ruSpamNS_V1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_en.md b/docs/_posts/ahmedlone127/2024-09-11-sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_en.md new file mode 100644 index 00000000000000..0b47e6169d9506 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4 RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_en_5.5.0_3.0_1726061287521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_en_5.5.0_3.0_1726061287521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JerryYanJiang/SA-tweet-bert-large-e6-w1-1.5-b16-m4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_pipeline_en.md new file mode 100644 index 00000000000000..b03449fdb8dbee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_pipeline pipeline RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_pipeline` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_pipeline_en_5.5.0_3.0_1726061354934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_pipeline_en_5.5.0_3.0_1726061354934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sanskrit_saskta_tweet_bert_large_e6_w1_1_5_b16_m4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JerryYanJiang/SA-tweet-bert-large-e6-w1-1.5-b16-m4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-saved_model_body_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-saved_model_body_pipeline_en.md new file mode 100644 index 00000000000000..ada27cd4b79022 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-saved_model_body_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English saved_model_body_pipeline pipeline RoBertaForSequenceClassification from Akhil0-o +author: John Snow Labs +name: saved_model_body_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`saved_model_body_pipeline` is a English model originally trained by Akhil0-o. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/saved_model_body_pipeline_en_5.5.0_3.0_1726060843216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/saved_model_body_pipeline_en_5.5.0_3.0_1726060843216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("saved_model_body_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("saved_model_body_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|saved_model_body_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|444.0 MB| + +## References + +https://huggingface.co/Akhil0-o/saved_model_body + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sent_bert_base_multilingual_cased_finetuned_hausa_xx.md b/docs/_posts/ahmedlone127/2024-09-11-sent_bert_base_multilingual_cased_finetuned_hausa_xx.md new file mode 100644 index 00000000000000..342d129847cc92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sent_bert_base_multilingual_cased_finetuned_hausa_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_hausa BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_hausa +date: 2024-09-11 +tags: [xx, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_hausa` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_hausa_xx_5.5.0_3.0_1726056548655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_hausa_xx_5.5.0_3.0_1726056548655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_cased_finetuned_hausa","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_cased_finetuned_hausa","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_hausa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|664.4 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-hausa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sentiment_roberta_e6_b16_v2_w0_01_en.md b/docs/_posts/ahmedlone127/2024-09-11-sentiment_roberta_e6_b16_v2_w0_01_en.md new file mode 100644 index 00000000000000..3b9c3d73fc0533 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sentiment_roberta_e6_b16_v2_w0_01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_roberta_e6_b16_v2_w0_01 RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: sentiment_roberta_e6_b16_v2_w0_01 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_roberta_e6_b16_v2_w0_01` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_roberta_e6_b16_v2_w0_01_en_5.5.0_3.0_1726082404079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_roberta_e6_b16_v2_w0_01_en_5.5.0_3.0_1726082404079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_roberta_e6_b16_v2_w0_01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_roberta_e6_b16_v2_w0_01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_roberta_e6_b16_v2_w0_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JerryYanJiang/sentiment-roberta-e6-b16-v2-w0.01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sentiment_roberta_twitter_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-sentiment_roberta_twitter_pipeline_en.md new file mode 100644 index 00000000000000..4d4db239da7d96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sentiment_roberta_twitter_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_roberta_twitter_pipeline pipeline RoBertaForSequenceClassification from researchworkai +author: John Snow Labs +name: sentiment_roberta_twitter_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_roberta_twitter_pipeline` is a English model originally trained by researchworkai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_roberta_twitter_pipeline_en_5.5.0_3.0_1726022740724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_roberta_twitter_pipeline_en_5.5.0_3.0_1726022740724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_roberta_twitter_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_roberta_twitter_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_roberta_twitter_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/researchworkai/Sentiment-roBERTa-Twitter + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-setfit_industry_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-setfit_industry_pipeline_en.md new file mode 100644 index 00000000000000..e8e7e27c983022 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-setfit_industry_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English setfit_industry_pipeline pipeline MPNetEmbeddings from BahaArfaoui +author: John Snow Labs +name: setfit_industry_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`setfit_industry_pipeline` is a English model originally trained by BahaArfaoui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/setfit_industry_pipeline_en_5.5.0_3.0_1726089213572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/setfit_industry_pipeline_en_5.5.0_3.0_1726089213572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("setfit_industry_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("setfit_industry_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|setfit_industry_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/BahaArfaoui/setfit_industry + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-stego_classifier_checkpoint_epoch_30_2024_07_26_14_26_52_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-stego_classifier_checkpoint_epoch_30_2024_07_26_14_26_52_pipeline_en.md new file mode 100644 index 00000000000000..4f4307d3899d01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-stego_classifier_checkpoint_epoch_30_2024_07_26_14_26_52_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_30_2024_07_26_14_26_52_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_30_2024_07_26_14_26_52_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_30_2024_07_26_14_26_52_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_30_2024_07_26_14_26_52_pipeline_en_5.5.0_3.0_1726051935281.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_30_2024_07_26_14_26_52_pipeline_en_5.5.0_3.0_1726051935281.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_30_2024_07_26_14_26_52_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_30_2024_07_26_14_26_52_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_30_2024_07_26_14_26_52_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-30-2024-07-26_14-26-52 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-ternary_persian_sentiment_analysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-ternary_persian_sentiment_analysis_pipeline_en.md new file mode 100644 index 00000000000000..d42b9df67d3935 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-ternary_persian_sentiment_analysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ternary_persian_sentiment_analysis_pipeline pipeline BertForSequenceClassification from rezaFarsh +author: John Snow Labs +name: ternary_persian_sentiment_analysis_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ternary_persian_sentiment_analysis_pipeline` is a English model originally trained by rezaFarsh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ternary_persian_sentiment_analysis_pipeline_en_5.5.0_3.0_1726095437557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ternary_persian_sentiment_analysis_pipeline_en_5.5.0_3.0_1726095437557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ternary_persian_sentiment_analysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ternary_persian_sentiment_analysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ternary_persian_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## References + +https://huggingface.co/rezaFarsh/ternary_persian_sentiment_analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-teroberta_en.md b/docs/_posts/ahmedlone127/2024-09-11-teroberta_en.md new file mode 100644 index 00000000000000..f413a3ca76b54b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-teroberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English teroberta RoBertaEmbeddings from subbareddyiiit +author: John Snow Labs +name: teroberta +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`teroberta` is a English model originally trained by subbareddyiiit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/teroberta_en_5.5.0_3.0_1726031871722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/teroberta_en_5.5.0_3.0_1726031871722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("teroberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("teroberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|teroberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|901.4 MB| + +## References + +https://huggingface.co/subbareddyiiit/TeRobeRta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-test_cwyhenry_en.md b/docs/_posts/ahmedlone127/2024-09-11-test_cwyhenry_en.md new file mode 100644 index 00000000000000..28e2ba783dd60a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-test_cwyhenry_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English test_cwyhenry DistilBertForQuestionAnswering from cwyhenry +author: John Snow Labs +name: test_cwyhenry +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_cwyhenry` is a English model originally trained by cwyhenry. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_cwyhenry_en_5.5.0_3.0_1726088132075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_cwyhenry_en_5.5.0_3.0_1726088132075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("test_cwyhenry","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("test_cwyhenry", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_cwyhenry| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/cwyhenry/test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-traineddistilbert_en.md b/docs/_posts/ahmedlone127/2024-09-11-traineddistilbert_en.md new file mode 100644 index 00000000000000..2bdae4ac3ad514 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-traineddistilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English traineddistilbert DistilBertForTokenClassification from BetterCartAnalytics +author: John Snow Labs +name: traineddistilbert +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`traineddistilbert` is a English model originally trained by BetterCartAnalytics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/traineddistilbert_en_5.5.0_3.0_1726048768759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/traineddistilbert_en_5.5.0_3.0_1726048768759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("traineddistilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("traineddistilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|traineddistilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/BetterCartAnalytics/TrainedDistilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-transfer_course_distilroberta_base_mrpc_glue_nestor_mamani_en.md b/docs/_posts/ahmedlone127/2024-09-11-transfer_course_distilroberta_base_mrpc_glue_nestor_mamani_en.md new file mode 100644 index 00000000000000..1bef28bfaa9260 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-transfer_course_distilroberta_base_mrpc_glue_nestor_mamani_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English transfer_course_distilroberta_base_mrpc_glue_nestor_mamani RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: transfer_course_distilroberta_base_mrpc_glue_nestor_mamani +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transfer_course_distilroberta_base_mrpc_glue_nestor_mamani` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transfer_course_distilroberta_base_mrpc_glue_nestor_mamani_en_5.5.0_3.0_1726082209363.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transfer_course_distilroberta_base_mrpc_glue_nestor_mamani_en_5.5.0_3.0_1726082209363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("transfer_course_distilroberta_base_mrpc_glue_nestor_mamani","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("transfer_course_distilroberta_base_mrpc_glue_nestor_mamani", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transfer_course_distilroberta_base_mrpc_glue_nestor_mamani| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.7 MB| + +## References + +https://huggingface.co/platzi/transfer-course-distilroberta-base-mrpc-glue-nestor-mamani \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-tswana_final_all_mpnet_base_v2_en.md b/docs/_posts/ahmedlone127/2024-09-11-tswana_final_all_mpnet_base_v2_en.md new file mode 100644 index 00000000000000..8395e71e7dcfc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-tswana_final_all_mpnet_base_v2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English tswana_final_all_mpnet_base_v2 MPNetEmbeddings from alicekwak +author: John Snow Labs +name: tswana_final_all_mpnet_base_v2 +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tswana_final_all_mpnet_base_v2` is a English model originally trained by alicekwak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tswana_final_all_mpnet_base_v2_en_5.5.0_3.0_1726054835953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tswana_final_all_mpnet_base_v2_en_5.5.0_3.0_1726054835953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("tswana_final_all_mpnet_base_v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("tswana_final_all_mpnet_base_v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tswana_final_all_mpnet_base_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/alicekwak/TN-final-all-mpnet-base-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-twitter_roberta_base_sentiment_gunkaynar_en.md b/docs/_posts/ahmedlone127/2024-09-11-twitter_roberta_base_sentiment_gunkaynar_en.md new file mode 100644 index 00000000000000..4b19dc51cfd399 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-twitter_roberta_base_sentiment_gunkaynar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_roberta_base_sentiment_gunkaynar RoBertaForSequenceClassification from gunkaynar +author: John Snow Labs +name: twitter_roberta_base_sentiment_gunkaynar +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_sentiment_gunkaynar` is a English model originally trained by gunkaynar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_sentiment_gunkaynar_en_5.5.0_3.0_1726062973632.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_sentiment_gunkaynar_en_5.5.0_3.0_1726062973632.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_sentiment_gunkaynar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_sentiment_gunkaynar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_sentiment_gunkaynar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/gunkaynar/twitter-roberta-base-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-twitter_sentiment_analysis_roberta_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-twitter_sentiment_analysis_roberta_model_pipeline_en.md new file mode 100644 index 00000000000000..189475e939b8d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-twitter_sentiment_analysis_roberta_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_sentiment_analysis_roberta_model_pipeline pipeline RoBertaForSequenceClassification from lindaclara22 +author: John Snow Labs +name: twitter_sentiment_analysis_roberta_model_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_sentiment_analysis_roberta_model_pipeline` is a English model originally trained by lindaclara22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_sentiment_analysis_roberta_model_pipeline_en_5.5.0_3.0_1726062857255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_sentiment_analysis_roberta_model_pipeline_en_5.5.0_3.0_1726062857255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_sentiment_analysis_roberta_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_sentiment_analysis_roberta_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_sentiment_analysis_roberta_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.3 MB| + +## References + +https://huggingface.co/lindaclara22/twitter-sentiment-analysis-roberta-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_all_scionk_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_all_scionk_en.md new file mode 100644 index 00000000000000..7785a73f9703d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_all_scionk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_scionk XlmRoBertaForTokenClassification from scionk +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_scionk +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_scionk` is a English model originally trained by scionk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_scionk_en_5.5.0_3.0_1726067712266.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_scionk_en_5.5.0_3.0_1726067712266.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_scionk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_scionk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_scionk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/scionk/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_french_huggingbase_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_french_huggingbase_en.md new file mode 100644 index 00000000000000..c83836bbff030d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_french_huggingbase_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_huggingbase XlmRoBertaForTokenClassification from huggingbase +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_huggingbase +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_huggingbase` is a English model originally trained by huggingbase. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_huggingbase_en_5.5.0_3.0_1726068591261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_huggingbase_en_5.5.0_3.0_1726068591261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_huggingbase","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_huggingbase", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_huggingbase| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/huggingbase/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_french_huggingbase_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_french_huggingbase_pipeline_en.md new file mode 100644 index 00000000000000..c9c97201005737 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_french_huggingbase_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_huggingbase_pipeline pipeline XlmRoBertaForTokenClassification from huggingbase +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_huggingbase_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_huggingbase_pipeline` is a English model originally trained by huggingbase. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_huggingbase_pipeline_en_5.5.0_3.0_1726068670010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_huggingbase_pipeline_en_5.5.0_3.0_1726068670010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_huggingbase_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_huggingbase_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_huggingbase_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/huggingbase/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_french_sungkwangjoong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_french_sungkwangjoong_pipeline_en.md new file mode 100644 index 00000000000000..cde798e6a247a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_french_sungkwangjoong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_sungkwangjoong_pipeline pipeline XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_sungkwangjoong_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_sungkwangjoong_pipeline` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sungkwangjoong_pipeline_en_5.5.0_3.0_1726086504579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sungkwangjoong_pipeline_en_5.5.0_3.0_1726086504579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_sungkwangjoong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_sungkwangjoong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_sungkwangjoong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_french_chris_choi_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_french_chris_choi_en.md new file mode 100644 index 00000000000000..c37a483a7a35ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_french_chris_choi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_chris_choi XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_chris_choi +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_chris_choi` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_chris_choi_en_5.5.0_3.0_1726079179032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_chris_choi_en_5.5.0_3.0_1726079179032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_chris_choi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_chris_choi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_chris_choi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_thaint_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_thaint_pipeline_en.md new file mode 100644 index 00000000000000..43806e93a1a6f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_thaint_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_thaint_pipeline pipeline XlmRoBertaForTokenClassification from thaint +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_thaint_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_thaint_pipeline` is a English model originally trained by thaint. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_thaint_pipeline_en_5.5.0_3.0_1726079036752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_thaint_pipeline_en_5.5.0_3.0_1726079036752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_thaint_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_thaint_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_thaint_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/thaint/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_mapa_coarse_ner_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_mapa_coarse_ner_en.md new file mode 100644 index 00000000000000..38842761e1e63a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_mapa_coarse_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_mapa_coarse_ner XlmRoBertaForTokenClassification from dmargutierrez +author: John Snow Labs +name: xlm_roberta_base_mapa_coarse_ner +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_mapa_coarse_ner` is a English model originally trained by dmargutierrez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mapa_coarse_ner_en_5.5.0_3.0_1726077764978.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mapa_coarse_ner_en_5.5.0_3.0_1726077764978.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_mapa_coarse_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_mapa_coarse_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_mapa_coarse_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|522.8 MB| + +## References + +https://huggingface.co/dmargutierrez/xlm-roberta-base-mapa_coarse-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-zh2en20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-zh2en20_pipeline_en.md new file mode 100644 index 00000000000000..740368bbe73493 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-zh2en20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English zh2en20_pipeline pipeline MarianTransformer from Carlosino +author: John Snow Labs +name: zh2en20_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`zh2en20_pipeline` is a English model originally trained by Carlosino. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/zh2en20_pipeline_en_5.5.0_3.0_1726038762656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/zh2en20_pipeline_en_5.5.0_3.0_1726038762656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("zh2en20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("zh2en20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|zh2en20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|540.5 MB| + +## References + +https://huggingface.co/Carlosino/zh2en20 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-adapter_203_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-adapter_203_pipeline_en.md new file mode 100644 index 00000000000000..df54f5b086ade5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-adapter_203_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English adapter_203_pipeline pipeline RoBertaForQuestionAnswering from kyone +author: John Snow Labs +name: adapter_203_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adapter_203_pipeline` is a English model originally trained by kyone. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adapter_203_pipeline_en_5.5.0_3.0_1726107052216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adapter_203_pipeline_en_5.5.0_3.0_1726107052216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("adapter_203_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("adapter_203_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adapter_203_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.3 MB| + +## References + +https://huggingface.co/kyone/adapter_203 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-ai_generated_text_detection_en.md b/docs/_posts/ahmedlone127/2024-09-12-ai_generated_text_detection_en.md new file mode 100644 index 00000000000000..bae31fa17ec9a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-ai_generated_text_detection_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ai_generated_text_detection BertForSequenceClassification from Juner +author: John Snow Labs +name: ai_generated_text_detection +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_generated_text_detection` is a English model originally trained by Juner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_generated_text_detection_en_5.5.0_3.0_1726181938327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_generated_text_detection_en_5.5.0_3.0_1726181938327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ai_generated_text_detection","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ai_generated_text_detection", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_generated_text_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|383.2 MB| + +## References + +https://huggingface.co/Juner/AI-generated-text-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-ai_generated_text_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-ai_generated_text_detection_pipeline_en.md new file mode 100644 index 00000000000000..30962a05f671c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-ai_generated_text_detection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ai_generated_text_detection_pipeline pipeline BertForSequenceClassification from Juner +author: John Snow Labs +name: ai_generated_text_detection_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_generated_text_detection_pipeline` is a English model originally trained by Juner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_generated_text_detection_pipeline_en_5.5.0_3.0_1726181956352.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_generated_text_detection_pipeline_en_5.5.0_3.0_1726181956352.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ai_generated_text_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ai_generated_text_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_generated_text_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|383.2 MB| + +## References + +https://huggingface.co/Juner/AI-generated-text-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-amharicqa_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-12-amharicqa_roberta_en.md new file mode 100644 index 00000000000000..d2686ac66a9fa4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-amharicqa_roberta_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English amharicqa_roberta RoBertaForQuestionAnswering from Gizachew +author: John Snow Labs +name: amharicqa_roberta +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amharicqa_roberta` is a English model originally trained by Gizachew. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amharicqa_roberta_en_5.5.0_3.0_1726106679183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amharicqa_roberta_en_5.5.0_3.0_1726106679183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("amharicqa_roberta","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("amharicqa_roberta", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amharicqa_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/Gizachew/AmharicQA-RoBERta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-answer_equivalence_tiny_bert_zli12321_en.md b/docs/_posts/ahmedlone127/2024-09-12-answer_equivalence_tiny_bert_zli12321_en.md new file mode 100644 index 00000000000000..24535926d923de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-answer_equivalence_tiny_bert_zli12321_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English answer_equivalence_tiny_bert_zli12321 BertForSequenceClassification from zli12321 +author: John Snow Labs +name: answer_equivalence_tiny_bert_zli12321 +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`answer_equivalence_tiny_bert_zli12321` is a English model originally trained by zli12321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/answer_equivalence_tiny_bert_zli12321_en_5.5.0_3.0_1726123181941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/answer_equivalence_tiny_bert_zli12321_en_5.5.0_3.0_1726123181941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("answer_equivalence_tiny_bert_zli12321","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("answer_equivalence_tiny_bert_zli12321", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|answer_equivalence_tiny_bert_zli12321| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/zli12321/answer_equivalence_tiny_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-arabic_distilbert_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-arabic_distilbert_model_pipeline_en.md new file mode 100644 index 00000000000000..e601037d40e05a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-arabic_distilbert_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English arabic_distilbert_model_pipeline pipeline BertForSequenceClassification from shahendaadel211 +author: John Snow Labs +name: arabic_distilbert_model_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabic_distilbert_model_pipeline` is a English model originally trained by shahendaadel211. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabic_distilbert_model_pipeline_en_5.5.0_3.0_1726181892415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabic_distilbert_model_pipeline_en_5.5.0_3.0_1726181892415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arabic_distilbert_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arabic_distilbert_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabic_distilbert_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|43.6 MB| + +## References + +https://huggingface.co/shahendaadel211/arabic-distilbert-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-autotrain_emotion_detection_1366352626_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-autotrain_emotion_detection_1366352626_pipeline_en.md new file mode 100644 index 00000000000000..98b5a831af815f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-autotrain_emotion_detection_1366352626_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_emotion_detection_1366352626_pipeline pipeline BertForSequenceClassification from rahulmallah +author: John Snow Labs +name: autotrain_emotion_detection_1366352626_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_emotion_detection_1366352626_pipeline` is a English model originally trained by rahulmallah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_emotion_detection_1366352626_pipeline_en_5.5.0_3.0_1726182096616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_emotion_detection_1366352626_pipeline_en_5.5.0_3.0_1726182096616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_emotion_detection_1366352626_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_emotion_detection_1366352626_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_emotion_detection_1366352626_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/rahulmallah/autotrain-emotion-detection-1366352626 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-12-babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_en.md new file mode 100644 index 00000000000000..dcb46f7c68cc97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_en_5.5.0_3.0_1726175870805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_en_5.5.0_3.0_1726175870805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes_2.5M_wikipedia1_2.5M-without-Masking-seed6-finetuned-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..437a1c88181e66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_pipeline pipeline RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_pipeline` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_pipeline_en_5.5.0_3.0_1726175872789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_pipeline_en_5.5.0_3.0_1726175872789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_2_5m_wikipedia1_2_5m_without_masking_seed6_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes_2.5M_wikipedia1_2.5M-without-Masking-seed6-finetuned-SQuAD + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-bert_base_arabic_emotion_analysis_v2_en.md b/docs/_posts/ahmedlone127/2024-09-12-bert_base_arabic_emotion_analysis_v2_en.md new file mode 100644 index 00000000000000..aff1f3f9238c48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-bert_base_arabic_emotion_analysis_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_arabic_emotion_analysis_v2 BertForSequenceClassification from alpcansoydas +author: John Snow Labs +name: bert_base_arabic_emotion_analysis_v2 +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_arabic_emotion_analysis_v2` is a English model originally trained by alpcansoydas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_arabic_emotion_analysis_v2_en_5.5.0_3.0_1726181854542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_arabic_emotion_analysis_v2_en_5.5.0_3.0_1726181854542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_arabic_emotion_analysis_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_arabic_emotion_analysis_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_arabic_emotion_analysis_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.2 MB| + +## References + +https://huggingface.co/alpcansoydas/bert-base-arabic-emotion-analysis-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-bert_classifier_autotrain_sentiment_4_812425472_en.md b/docs/_posts/ahmedlone127/2024-09-12-bert_classifier_autotrain_sentiment_4_812425472_en.md new file mode 100644 index 00000000000000..84aa579dd50ad2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-bert_classifier_autotrain_sentiment_4_812425472_en.md @@ -0,0 +1,104 @@ +--- +layout: model +title: English BertForSequenceClassification Cased model (from charly) +author: John Snow Labs +name: bert_classifier_autotrain_sentiment_4_812425472 +date: 2024-09-12 +tags: [en, open_source, bert, sequence_classification, classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-sentiment-4-812425472` is a English model originally trained by `charly`. + +## Predicted Entities + +`negative`, `mixed`, `no_impact`, `positive` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_autotrain_sentiment_4_812425472_en_5.5.0_3.0_1726182078767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_autotrain_sentiment_4_812425472_en_5.5.0_3.0_1726182078767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +seq_classifier = BertForSequenceClassification.pretrained("bert_classifier_autotrain_sentiment_4_812425472","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("class") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, seq_classifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val seq_classifier = BertForSequenceClassification.pretrained("bert_classifier_autotrain_sentiment_4_812425472","en") + .setInputCols(Array("document", "token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, seq_classifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.classify.bert.sentiment.by_charly").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_autotrain_sentiment_4_812425472| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +References + +- https://huggingface.co/charly/autotrain-sentiment-4-812425472 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-bert_corpus_ft_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-bert_corpus_ft_pipeline_en.md new file mode 100644 index 00000000000000..ad1159478701af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-bert_corpus_ft_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_corpus_ft_pipeline pipeline RoBertaEmbeddings from desarrolloasesoreslocales +author: John Snow Labs +name: bert_corpus_ft_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_corpus_ft_pipeline` is a English model originally trained by desarrolloasesoreslocales. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_corpus_ft_pipeline_en_5.5.0_3.0_1726113243794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_corpus_ft_pipeline_en_5.5.0_3.0_1726113243794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_corpus_ft_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_corpus_ft_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_corpus_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|478.9 MB| + +## References + +https://huggingface.co/desarrolloasesoreslocales/bert-corpus-ft + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-bert_resume_classification_en.md b/docs/_posts/ahmedlone127/2024-09-12-bert_resume_classification_en.md new file mode 100644 index 00000000000000..2800cdcbfdcc8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-bert_resume_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_resume_classification BertForSequenceClassification from ahmedheakl +author: John Snow Labs +name: bert_resume_classification +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_resume_classification` is a English model originally trained by ahmedheakl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_resume_classification_en_5.5.0_3.0_1726181879167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_resume_classification_en_5.5.0_3.0_1726181879167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_resume_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_resume_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_resume_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/ahmedheakl/bert-resume-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-bert_token_classifier_parsbert_armanner_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-12-bert_token_classifier_parsbert_armanner_pipeline_fa.md new file mode 100644 index 00000000000000..6400e4918d77cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-bert_token_classifier_parsbert_armanner_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian bert_token_classifier_parsbert_armanner_pipeline pipeline BertForTokenClassification from HooshvareLab +author: John Snow Labs +name: bert_token_classifier_parsbert_armanner_pipeline +date: 2024-09-12 +tags: [fa, open_source, pipeline, onnx] +task: Named Entity Recognition +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_token_classifier_parsbert_armanner_pipeline` is a Persian model originally trained by HooshvareLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_parsbert_armanner_pipeline_fa_5.5.0_3.0_1726174222716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_parsbert_armanner_pipeline_fa_5.5.0_3.0_1726174222716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_token_classifier_parsbert_armanner_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_token_classifier_parsbert_armanner_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_parsbert_armanner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|606.5 MB| + +## References + +https://huggingface.co/HooshvareLab/bert-base-parsbert-armanner-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-boolq_microsoft_deberta_v3_base_seed_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-boolq_microsoft_deberta_v3_base_seed_3_pipeline_en.md new file mode 100644 index 00000000000000..93e938a71d8b8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-boolq_microsoft_deberta_v3_base_seed_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English boolq_microsoft_deberta_v3_base_seed_3_pipeline pipeline DeBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: boolq_microsoft_deberta_v3_base_seed_3_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`boolq_microsoft_deberta_v3_base_seed_3_pipeline` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/boolq_microsoft_deberta_v3_base_seed_3_pipeline_en_5.5.0_3.0_1726133395910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/boolq_microsoft_deberta_v3_base_seed_3_pipeline_en_5.5.0_3.0_1726133395910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("boolq_microsoft_deberta_v3_base_seed_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("boolq_microsoft_deberta_v3_base_seed_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|boolq_microsoft_deberta_v3_base_seed_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|620.1 MB| + +## References + +https://huggingface.co/utahnlp/boolq_microsoft_deberta-v3-base_seed-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_model_willw9758_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_model_willw9758_pipeline_en.md new file mode 100644 index 00000000000000..2f5da605d96aeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_model_willw9758_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_willw9758_pipeline pipeline RoBertaForSequenceClassification from willw9758 +author: John Snow Labs +name: burmese_awesome_model_willw9758_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_willw9758_pipeline` is a English model originally trained by willw9758. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_willw9758_pipeline_en_5.5.0_3.0_1726108372612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_willw9758_pipeline_en_5.5.0_3.0_1726108372612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_willw9758_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_willw9758_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_willw9758_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.1 MB| + +## References + +https://huggingface.co/willw9758/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_perrysburg_knight_en.md b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_perrysburg_knight_en.md new file mode 100644 index 00000000000000..e53539fd9cb41e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_perrysburg_knight_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_perrysburg_knight DistilBertForQuestionAnswering from Perrysburg-Knight +author: John Snow Labs +name: burmese_awesome_qa_model_perrysburg_knight +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_perrysburg_knight` is a English model originally trained by Perrysburg-Knight. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_perrysburg_knight_en_5.5.0_3.0_1726180489332.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_perrysburg_knight_en_5.5.0_3.0_1726180489332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_perrysburg_knight","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_perrysburg_knight", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_perrysburg_knight| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Perrysburg-Knight/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_razafridi088_en.md b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_razafridi088_en.md new file mode 100644 index 00000000000000..c753f49ec8c521 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_razafridi088_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_razafridi088 DistilBertForQuestionAnswering from razafridi088 +author: John Snow Labs +name: burmese_awesome_qa_model_razafridi088 +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_razafridi088` is a English model originally trained by razafridi088. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_razafridi088_en_5.5.0_3.0_1726180391908.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_razafridi088_en_5.5.0_3.0_1726180391908.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_razafridi088","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_razafridi088", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_razafridi088| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/razafridi088/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_sritharansk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_sritharansk_pipeline_en.md new file mode 100644 index 00000000000000..9b118763ea29b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_sritharansk_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_sritharansk_pipeline pipeline DistilBertForQuestionAnswering from sritharansk +author: John Snow Labs +name: burmese_awesome_qa_model_sritharansk_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_sritharansk_pipeline` is a English model originally trained by sritharansk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_sritharansk_pipeline_en_5.5.0_3.0_1726180487727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_sritharansk_pipeline_en_5.5.0_3.0_1726180487727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_sritharansk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_sritharansk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_sritharansk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/sritharansk/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-datosw_v1_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-datosw_v1_2_pipeline_en.md new file mode 100644 index 00000000000000..6aed8042ebacd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-datosw_v1_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English datosw_v1_2_pipeline pipeline MarianTransformer from ZcepZtar +author: John Snow Labs +name: datosw_v1_2_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`datosw_v1_2_pipeline` is a English model originally trained by ZcepZtar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/datosw_v1_2_pipeline_en_5.5.0_3.0_1726111196157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/datosw_v1_2_pipeline_en_5.5.0_3.0_1726111196157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("datosw_v1_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("datosw_v1_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|datosw_v1_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|504.8 MB| + +## References + +https://huggingface.co/ZcepZtar/DaToSw_V1.2 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge13_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-12-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge13_simsp_en.md new file mode 100644 index 00000000000000..b1c006cd1328c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge13_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge13_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge13_simsp +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge13_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge13_simsp_en_5.5.0_3.0_1726100372422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge13_simsp_en_5.5.0_3.0_1726100372422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge13_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge13_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge13_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge13_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-distilbert_medical_question_answer_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-12-distilbert_medical_question_answer_finetuned_squad_en.md new file mode 100644 index 00000000000000..77e9e5f269e0e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-distilbert_medical_question_answer_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_medical_question_answer_finetuned_squad DistilBertForQuestionAnswering from Ashmit06 +author: John Snow Labs +name: distilbert_medical_question_answer_finetuned_squad +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_medical_question_answer_finetuned_squad` is a English model originally trained by Ashmit06. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_medical_question_answer_finetuned_squad_en_5.5.0_3.0_1726180426989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_medical_question_answer_finetuned_squad_en_5.5.0_3.0_1726180426989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_medical_question_answer_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_medical_question_answer_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_medical_question_answer_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Ashmit06/distilbert-medical-question_answer-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-dken_en.md b/docs/_posts/ahmedlone127/2024-09-12-dken_en.md new file mode 100644 index 00000000000000..3aab6a30a0816d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-dken_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dken MarianTransformer from joneikholm +author: John Snow Labs +name: dken +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dken` is a English model originally trained by joneikholm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dken_en_5.5.0_3.0_1726167419719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dken_en_5.5.0_3.0_1726167419719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("dken","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("dken","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dken| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|504.6 MB| + +## References + +https://huggingface.co/joneikholm/dken \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-finetuned_twitter_targeted_insult_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-12-finetuned_twitter_targeted_insult_roberta_en.md new file mode 100644 index 00000000000000..bf4b2eca170d4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-finetuned_twitter_targeted_insult_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_twitter_targeted_insult_roberta XlmRoBertaForSequenceClassification from coderSounak +author: John Snow Labs +name: finetuned_twitter_targeted_insult_roberta +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_twitter_targeted_insult_roberta` is a English model originally trained by coderSounak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_twitter_targeted_insult_roberta_en_5.5.0_3.0_1726147342244.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_twitter_targeted_insult_roberta_en_5.5.0_3.0_1726147342244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("finetuned_twitter_targeted_insult_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("finetuned_twitter_targeted_insult_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_twitter_targeted_insult_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/coderSounak/finetuned_twitter_targeted_insult_roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-finetuning_emotion_model_surajmahapatra_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-finetuning_emotion_model_surajmahapatra_pipeline_en.md new file mode 100644 index 00000000000000..e5adcca992b83e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-finetuning_emotion_model_surajmahapatra_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_emotion_model_surajmahapatra_pipeline pipeline DistilBertForSequenceClassification from SurajMahapatra +author: John Snow Labs +name: finetuning_emotion_model_surajmahapatra_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_emotion_model_surajmahapatra_pipeline` is a English model originally trained by SurajMahapatra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_surajmahapatra_pipeline_en_5.5.0_3.0_1726100121414.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_surajmahapatra_pipeline_en_5.5.0_3.0_1726100121414.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_emotion_model_surajmahapatra_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_emotion_model_surajmahapatra_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_emotion_model_surajmahapatra_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SurajMahapatra/finetuning-emotion-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-finetuning_segment_model_3000_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-finetuning_segment_model_3000_samples_pipeline_en.md new file mode 100644 index 00000000000000..30dfc855417646 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-finetuning_segment_model_3000_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_segment_model_3000_samples_pipeline pipeline DistilBertForSequenceClassification from MattiaCampanella1993 +author: John Snow Labs +name: finetuning_segment_model_3000_samples_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_segment_model_3000_samples_pipeline` is a English model originally trained by MattiaCampanella1993. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_segment_model_3000_samples_pipeline_en_5.5.0_3.0_1726124781847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_segment_model_3000_samples_pipeline_en_5.5.0_3.0_1726124781847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_segment_model_3000_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_segment_model_3000_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_segment_model_3000_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MattiaCampanella1993/finetuning-segment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-first_qa_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-first_qa_model_pipeline_en.md new file mode 100644 index 00000000000000..85251780d489c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-first_qa_model_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English first_qa_model_pipeline pipeline DistilBertForQuestionAnswering from hklair +author: John Snow Labs +name: first_qa_model_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`first_qa_model_pipeline` is a English model originally trained by hklair. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/first_qa_model_pipeline_en_5.5.0_3.0_1726180185784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/first_qa_model_pipeline_en_5.5.0_3.0_1726180185784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("first_qa_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("first_qa_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|first_qa_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/hklair/first_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-flat_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-flat_model_pipeline_en.md new file mode 100644 index 00000000000000..7a100c730ead21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-flat_model_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English flat_model_pipeline pipeline DistilBertForQuestionAnswering from rugvedabodke +author: John Snow Labs +name: flat_model_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`flat_model_pipeline` is a English model originally trained by rugvedabodke. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/flat_model_pipeline_en_5.5.0_3.0_1726180586813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/flat_model_pipeline_en_5.5.0_3.0_1726180586813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("flat_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("flat_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|flat_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/rugvedabodke/flat_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-gal_ner_xlmr_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-gal_ner_xlmr_2_pipeline_en.md new file mode 100644 index 00000000000000..d515b8db00f7cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-gal_ner_xlmr_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gal_ner_xlmr_2_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_ner_xlmr_2_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ner_xlmr_2_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ner_xlmr_2_pipeline_en_5.5.0_3.0_1726131307420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ner_xlmr_2_pipeline_en_5.5.0_3.0_1726131307420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_ner_xlmr_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_ner_xlmr_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ner_xlmr_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|808.7 MB| + +## References + +https://huggingface.co/homersimpson/gal-ner-xlmr-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-german_tonga_tonga_islands_english_en.md b/docs/_posts/ahmedlone127/2024-09-12-german_tonga_tonga_islands_english_en.md new file mode 100644 index 00000000000000..0947fb22462af4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-german_tonga_tonga_islands_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English german_tonga_tonga_islands_english MarianTransformer from gouravsinha +author: John Snow Labs +name: german_tonga_tonga_islands_english +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`german_tonga_tonga_islands_english` is a English model originally trained by gouravsinha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/german_tonga_tonga_islands_english_en_5.5.0_3.0_1726110593604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/german_tonga_tonga_islands_english_en_5.5.0_3.0_1726110593604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("german_tonga_tonga_islands_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("german_tonga_tonga_islands_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|german_tonga_tonga_islands_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|499.4 MB| + +## References + +https://huggingface.co/gouravsinha/german_to_english \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-hate_hate_balance_random3_seed0_twitter_roberta_base_2021_124m_en.md b/docs/_posts/ahmedlone127/2024-09-12-hate_hate_balance_random3_seed0_twitter_roberta_base_2021_124m_en.md new file mode 100644 index 00000000000000..9409a88c06642f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-hate_hate_balance_random3_seed0_twitter_roberta_base_2021_124m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random3_seed0_twitter_roberta_base_2021_124m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random3_seed0_twitter_roberta_base_2021_124m +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random3_seed0_twitter_roberta_base_2021_124m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed0_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1726117886460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed0_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1726117886460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed0_twitter_roberta_base_2021_124m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed0_twitter_roberta_base_2021_124m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random3_seed0_twitter_roberta_base_2021_124m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random3_seed0-twitter-roberta-base-2021-124m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-italian_emotion_analyzer_it.md b/docs/_posts/ahmedlone127/2024-09-12-italian_emotion_analyzer_it.md new file mode 100644 index 00000000000000..52b07577997710 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-italian_emotion_analyzer_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian italian_emotion_analyzer BertForSequenceClassification from aiknowyou +author: John Snow Labs +name: italian_emotion_analyzer +date: 2024-09-12 +tags: [it, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`italian_emotion_analyzer` is a Italian model originally trained by aiknowyou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/italian_emotion_analyzer_it_5.5.0_3.0_1726104510722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/italian_emotion_analyzer_it_5.5.0_3.0_1726104510722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("italian_emotion_analyzer","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("italian_emotion_analyzer", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|italian_emotion_analyzer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|it| +|Size:|412.0 MB| + +## References + +https://huggingface.co/aiknowyou/it-emotion-analyzer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-lab1_random_bill1886_en.md b/docs/_posts/ahmedlone127/2024-09-12-lab1_random_bill1886_en.md new file mode 100644 index 00000000000000..7e3e37adc4ecc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-lab1_random_bill1886_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lab1_random_bill1886 MarianTransformer from bill1886 +author: John Snow Labs +name: lab1_random_bill1886 +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_random_bill1886` is a English model originally trained by bill1886. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_random_bill1886_en_5.5.0_3.0_1726168167511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_random_bill1886_en_5.5.0_3.0_1726168167511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("lab1_random_bill1886","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("lab1_random_bill1886","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_random_bill1886| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|509.8 MB| + +## References + +https://huggingface.co/bill1886/lab1_random \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-lenu_us_ma_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-lenu_us_ma_pipeline_en.md new file mode 100644 index 00000000000000..0bdb9723fa871b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-lenu_us_ma_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lenu_us_ma_pipeline pipeline BertForSequenceClassification from Sociovestix +author: John Snow Labs +name: lenu_us_ma_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenu_us_ma_pipeline` is a English model originally trained by Sociovestix. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenu_us_ma_pipeline_en_5.5.0_3.0_1726123261499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenu_us_ma_pipeline_en_5.5.0_3.0_1726123261499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lenu_us_ma_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lenu_us_ma_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenu_us_ma_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|627.8 MB| + +## References + +https://huggingface.co/Sociovestix/lenu_US-MA + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_chinese_twi_accelerate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_chinese_twi_accelerate_pipeline_en.md new file mode 100644 index 00000000000000..05c3c2185561ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_chinese_twi_accelerate_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_chinese_twi_accelerate_pipeline pipeline MarianTransformer from e1879 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_chinese_twi_accelerate_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_chinese_twi_accelerate_pipeline` is a English model originally trained by e1879. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_chinese_twi_accelerate_pipeline_en_5.5.0_3.0_1726127006155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_chinese_twi_accelerate_pipeline_en_5.5.0_3.0_1726127006155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_chinese_twi_accelerate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_chinese_twi_accelerate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_chinese_twi_accelerate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|541.2 MB| + +## References + +https://huggingface.co/e1879/marian-finetuned-kde4-en-to-zh-tw-accelerate + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_mbateman_en.md b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_mbateman_en.md new file mode 100644 index 00000000000000..d4a0e8c110c2ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_mbateman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_mbateman MarianTransformer from mbateman +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_mbateman +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_mbateman` is a English model originally trained by mbateman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_mbateman_en_5.5.0_3.0_1726168045446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_mbateman_en_5.5.0_3.0_1726168045446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_mbateman","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_mbateman","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_mbateman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.0 MB| + +## References + +https://huggingface.co/mbateman/marian-finetuned-kde4-en-to-fr-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_liquannan_en.md b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_liquannan_en.md new file mode 100644 index 00000000000000..d21ceeedfb3123 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_liquannan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_liquannan MarianTransformer from liquannan +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_liquannan +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_liquannan` is a English model originally trained by liquannan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_liquannan_en_5.5.0_3.0_1726168227683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_liquannan_en_5.5.0_3.0_1726168227683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_liquannan","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_liquannan","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_liquannan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.2 MB| + +## References + +https://huggingface.co/liquannan/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_nick_carroll1_en.md b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_nick_carroll1_en.md new file mode 100644 index 00000000000000..92a31576186dbf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_nick_carroll1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_nick_carroll1 MarianTransformer from nick-carroll1 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_nick_carroll1 +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_nick_carroll1` is a English model originally trained by nick-carroll1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_nick_carroll1_en_5.5.0_3.0_1726167988780.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_nick_carroll1_en_5.5.0_3.0_1726167988780.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_nick_carroll1","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_nick_carroll1","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_nick_carroll1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/nick-carroll1/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_ricerlu_en.md b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_ricerlu_en.md new file mode 100644 index 00000000000000..5a474f3de24bac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_ricerlu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_ricerlu MarianTransformer from ricerlu +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_ricerlu +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_ricerlu` is a English model originally trained by ricerlu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ricerlu_en_5.5.0_3.0_1726167871404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ricerlu_en_5.5.0_3.0_1726167871404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_ricerlu","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_ricerlu","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_ricerlu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/ricerlu/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_vonewman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_vonewman_pipeline_en.md new file mode 100644 index 00000000000000..f2f1e4e3cc60d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_vonewman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_vonewman_pipeline pipeline MarianTransformer from vonewman +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_vonewman_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_vonewman_pipeline` is a English model originally trained by vonewman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_vonewman_pipeline_en_5.5.0_3.0_1726162154958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_vonewman_pipeline_en_5.5.0_3.0_1726162154958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_vonewman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_vonewman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_vonewman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.7 MB| + +## References + +https://huggingface.co/vonewman/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-mdeberta_v3_base_amazon_massive_intent_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-mdeberta_v3_base_amazon_massive_intent_pipeline_en.md new file mode 100644 index 00000000000000..93937bfc1d3744 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-mdeberta_v3_base_amazon_massive_intent_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mdeberta_v3_base_amazon_massive_intent_pipeline pipeline DeBertaForSequenceClassification from cartesinus +author: John Snow Labs +name: mdeberta_v3_base_amazon_massive_intent_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mdeberta_v3_base_amazon_massive_intent_pipeline` is a English model originally trained by cartesinus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mdeberta_v3_base_amazon_massive_intent_pipeline_en_5.5.0_3.0_1726162994037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mdeberta_v3_base_amazon_massive_intent_pipeline_en_5.5.0_3.0_1726162994037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mdeberta_v3_base_amazon_massive_intent_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mdeberta_v3_base_amazon_massive_intent_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mdeberta_v3_base_amazon_massive_intent_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|839.3 MB| + +## References + +https://huggingface.co/cartesinus/mdeberta-v3-base_amazon-massive_intent + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-mdeberta_v3_base_rte_100_en.md b/docs/_posts/ahmedlone127/2024-09-12-mdeberta_v3_base_rte_100_en.md new file mode 100644 index 00000000000000..6084c6f609b07b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-mdeberta_v3_base_rte_100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mdeberta_v3_base_rte_100 DeBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: mdeberta_v3_base_rte_100 +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mdeberta_v3_base_rte_100` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mdeberta_v3_base_rte_100_en_5.5.0_3.0_1726163501451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mdeberta_v3_base_rte_100_en_5.5.0_3.0_1726163501451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("mdeberta_v3_base_rte_100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("mdeberta_v3_base_rte_100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mdeberta_v3_base_rte_100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|790.6 MB| + +## References + +https://huggingface.co/tmnam20/mdeberta-v3-base-rte-100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-misinformation_covid_distilbert_base_german_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-misinformation_covid_distilbert_base_german_cased_pipeline_en.md new file mode 100644 index 00000000000000..e4246c3c7c1ac4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-misinformation_covid_distilbert_base_german_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English misinformation_covid_distilbert_base_german_cased_pipeline pipeline DistilBertForSequenceClassification from Ghunghru +author: John Snow Labs +name: misinformation_covid_distilbert_base_german_cased_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`misinformation_covid_distilbert_base_german_cased_pipeline` is a English model originally trained by Ghunghru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/misinformation_covid_distilbert_base_german_cased_pipeline_en_5.5.0_3.0_1726125026611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/misinformation_covid_distilbert_base_german_cased_pipeline_en_5.5.0_3.0_1726125026611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("misinformation_covid_distilbert_base_german_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("misinformation_covid_distilbert_base_german_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|misinformation_covid_distilbert_base_german_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|252.5 MB| + +## References + +https://huggingface.co/Ghunghru/Misinformation-Covid-distilbert-base-german-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-multi_label_food_en.md b/docs/_posts/ahmedlone127/2024-09-12-multi_label_food_en.md new file mode 100644 index 00000000000000..90950de7c2f0e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-multi_label_food_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English multi_label_food BertForSequenceClassification from munirot +author: John Snow Labs +name: multi_label_food +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multi_label_food` is a English model originally trained by munirot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multi_label_food_en_5.5.0_3.0_1726182333161.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multi_label_food_en_5.5.0_3.0_1726182333161.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("multi_label_food","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("multi_label_food", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multi_label_food| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.5 MB| + +## References + +https://huggingface.co/munirot/multi-label-food \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-nerd_nerd_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-nerd_nerd_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en.md new file mode 100644 index 00000000000000..733a21642fe41b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-nerd_nerd_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerd_nerd_random3_seed1_twitter_roberta_base_2019_90m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random3_seed1_twitter_roberta_base_2019_90m_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random3_seed1_twitter_roberta_base_2019_90m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726117676836.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726117676836.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerd_nerd_random3_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerd_nerd_random3_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random3_seed1_twitter_roberta_base_2019_90m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random3_seed1-twitter-roberta-base-2019-90m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-neurips_roberta_covid_1_en.md b/docs/_posts/ahmedlone127/2024-09-12-neurips_roberta_covid_1_en.md new file mode 100644 index 00000000000000..be0fc0c917fc22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-neurips_roberta_covid_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English neurips_roberta_covid_1 RoBertaForSequenceClassification from neurips-user +author: John Snow Labs +name: neurips_roberta_covid_1 +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`neurips_roberta_covid_1` is a English model originally trained by neurips-user. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/neurips_roberta_covid_1_en_5.5.0_3.0_1726108470725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/neurips_roberta_covid_1_en_5.5.0_3.0_1726108470725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("neurips_roberta_covid_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("neurips_roberta_covid_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|neurips_roberta_covid_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|418.0 MB| + +## References + +https://huggingface.co/neurips-user/neurips-roberta-covid-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-nsfw_text_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-nsfw_text_pipeline_en.md new file mode 100644 index 00000000000000..cd56d3acc17a82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-nsfw_text_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nsfw_text_pipeline pipeline DistilBertForSequenceClassification from circulus +author: John Snow Labs +name: nsfw_text_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nsfw_text_pipeline` is a English model originally trained by circulus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nsfw_text_pipeline_en_5.5.0_3.0_1726124979284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nsfw_text_pipeline_en_5.5.0_3.0_1726124979284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nsfw_text_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nsfw_text_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nsfw_text_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/circulus/nsfw_text + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_chinese_hk_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_chinese_hk_en.md new file mode 100644 index 00000000000000..2bd3c80898a925 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_chinese_hk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_chinese_hk MarianTransformer from steve-tong +author: John Snow Labs +name: opus_maltese_english_chinese_hk +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_chinese_hk` is a English model originally trained by steve-tong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_chinese_hk_en_5.5.0_3.0_1726162051009.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_chinese_hk_en_5.5.0_3.0_1726162051009.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_chinese_hk","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_chinese_hk","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_chinese_hk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|542.5 MB| + +## References + +https://huggingface.co/steve-tong/opus-mt-en-zh-hk \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ethanlane_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ethanlane_en.md new file mode 100644 index 00000000000000..e8ca2e5bd7f7aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ethanlane_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ethanlane MarianTransformer from ethanlane +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ethanlane +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ethanlane` is a English model originally trained by ethanlane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ethanlane_en_5.5.0_3.0_1726167437388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ethanlane_en_5.5.0_3.0_1726167437388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ethanlane","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ethanlane","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ethanlane| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/ethanlane/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_gare_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_gare_pipeline_en.md new file mode 100644 index 00000000000000..37670987374ebb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_gare_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_gare_pipeline pipeline MarianTransformer from Gare +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_gare_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_gare_pipeline` is a English model originally trained by Gare. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_gare_pipeline_en_5.5.0_3.0_1726167470556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_gare_pipeline_en_5.5.0_3.0_1726167470556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_gare_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_gare_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_gare_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.1 MB| + +## References + +https://huggingface.co/Gare/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_recoil1977_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_recoil1977_en.md new file mode 100644 index 00000000000000..490691b6b4800e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_recoil1977_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_recoil1977 MarianTransformer from Recoil1977 +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_recoil1977 +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_recoil1977` is a English model originally trained by Recoil1977. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_recoil1977_en_5.5.0_3.0_1726126664578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_recoil1977_en_5.5.0_3.0_1726126664578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_recoil1977","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_recoil1977","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_recoil1977| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/Recoil1977/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_tamil_5epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_tamil_5epochs_pipeline_en.md new file mode 100644 index 00000000000000..d3e74ce5ad259b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_tamil_5epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_tamil_5epochs_pipeline pipeline MarianTransformer from edu-shok +author: John Snow Labs +name: opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_tamil_5epochs_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_tamil_5epochs_pipeline` is a English model originally trained by edu-shok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_tamil_5epochs_pipeline_en_5.5.0_3.0_1726161624568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_tamil_5epochs_pipeline_en_5.5.0_3.0_1726161624568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_tamil_5epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_tamil_5epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_tamil_5epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|540.7 MB| + +## References + +https://huggingface.co/edu-shok/opus-mt-en-es-finetuned-en-to-es-TA-5EPOCHS + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_vietnamese_finetuned_iwslt15_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_vietnamese_finetuned_iwslt15_pipeline_en.md new file mode 100644 index 00000000000000..aafe9cce77fd93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_vietnamese_finetuned_iwslt15_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_vietnamese_finetuned_iwslt15_pipeline pipeline MarianTransformer from tdobrxl +author: John Snow Labs +name: opus_maltese_english_vietnamese_finetuned_iwslt15_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_vietnamese_finetuned_iwslt15_pipeline` is a English model originally trained by tdobrxl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_vietnamese_finetuned_iwslt15_pipeline_en_5.5.0_3.0_1726168319943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_vietnamese_finetuned_iwslt15_pipeline_en_5.5.0_3.0_1726168319943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_vietnamese_finetuned_iwslt15_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_vietnamese_finetuned_iwslt15_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_vietnamese_finetuned_iwslt15_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|475.2 MB| + +## References + +https://huggingface.co/tdobrxl/opus-mt-en-vi-finetuned-IWSLT15 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_finetuned_sundanese_tonga_tonga_islands_indonesian_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_finetuned_sundanese_tonga_tonga_islands_indonesian_en.md new file mode 100644 index 00000000000000..f2e7db36c84c12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_finetuned_sundanese_tonga_tonga_islands_indonesian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_finetuned_sundanese_tonga_tonga_islands_indonesian MarianTransformer from hermanshid +author: John Snow Labs +name: opus_maltese_finetuned_sundanese_tonga_tonga_islands_indonesian +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_finetuned_sundanese_tonga_tonga_islands_indonesian` is a English model originally trained by hermanshid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_finetuned_sundanese_tonga_tonga_islands_indonesian_en_5.5.0_3.0_1726126662061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_finetuned_sundanese_tonga_tonga_islands_indonesian_en_5.5.0_3.0_1726126662061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_finetuned_sundanese_tonga_tonga_islands_indonesian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_finetuned_sundanese_tonga_tonga_islands_indonesian","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_finetuned_sundanese_tonga_tonga_islands_indonesian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|507.7 MB| + +## References + +https://huggingface.co/hermanshid/opus-mt-finetuned-su-to-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_svsokol_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_svsokol_en.md new file mode 100644 index 00000000000000..45909d4b3bce08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_svsokol_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_svsokol MarianTransformer from svsokol +author: John Snow Labs +name: opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_svsokol +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_svsokol` is a English model originally trained by svsokol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_svsokol_en_5.5.0_3.0_1726110957840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_svsokol_en_5.5.0_3.0_1726110957840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_svsokol","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_svsokol","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_svsokol| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|526.4 MB| + +## References + +https://huggingface.co/svsokol/opus-mt-ru-en-finetuned-en-to-ru \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-politifact_roberta_clf_en.md b/docs/_posts/ahmedlone127/2024-09-12-politifact_roberta_clf_en.md new file mode 100644 index 00000000000000..1d7fd396b80e6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-politifact_roberta_clf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English politifact_roberta_clf RoBertaForSequenceClassification from ChandlerU11 +author: John Snow Labs +name: politifact_roberta_clf +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`politifact_roberta_clf` is a English model originally trained by ChandlerU11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/politifact_roberta_clf_en_5.5.0_3.0_1726117416234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/politifact_roberta_clf_en_5.5.0_3.0_1726117416234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("politifact_roberta_clf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("politifact_roberta_clf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|politifact_roberta_clf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.8 MB| + +## References + +https://huggingface.co/ChandlerU11/politifact_rOBERTa_CLF \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-quran_whisper_tiny_v1_ar.md b/docs/_posts/ahmedlone127/2024-09-12-quran_whisper_tiny_v1_ar.md new file mode 100644 index 00000000000000..95a1bfc44605fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-quran_whisper_tiny_v1_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic quran_whisper_tiny_v1 WhisperForCTC from cherifkhalifah +author: John Snow Labs +name: quran_whisper_tiny_v1 +date: 2024-09-12 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`quran_whisper_tiny_v1` is a Arabic model originally trained by cherifkhalifah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/quran_whisper_tiny_v1_ar_5.5.0_3.0_1726137985452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/quran_whisper_tiny_v1_ar_5.5.0_3.0_1726137985452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("quran_whisper_tiny_v1","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("quran_whisper_tiny_v1", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|quran_whisper_tiny_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|390.6 MB| + +## References + +https://huggingface.co/cherifkhalifah/quran-whisper-tiny-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_base_finetuned_squad_f_arnold_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_base_finetuned_squad_f_arnold_en.md new file mode 100644 index 00000000000000..d128eed3a95c3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_base_finetuned_squad_f_arnold_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_finetuned_squad_f_arnold RoBertaForQuestionAnswering from f-arnold +author: John Snow Labs +name: roberta_base_finetuned_squad_f_arnold +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_squad_f_arnold` is a English model originally trained by f-arnold. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_f_arnold_en_5.5.0_3.0_1726176053393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_f_arnold_en_5.5.0_3.0_1726176053393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetuned_squad_f_arnold","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetuned_squad_f_arnold", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_squad_f_arnold| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|461.8 MB| + +## References + +https://huggingface.co/f-arnold/roberta-base-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_base_qa_squad_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_base_qa_squad_en.md new file mode 100644 index 00000000000000..00f6e08c3221a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_base_qa_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_qa_squad RoBertaForQuestionAnswering from galkowskim +author: John Snow Labs +name: roberta_base_qa_squad +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_qa_squad` is a English model originally trained by galkowskim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_qa_squad_en_5.5.0_3.0_1726175843957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_qa_squad_en_5.5.0_3.0_1726175843957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_qa_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_qa_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_qa_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|461.9 MB| + +## References + +https://huggingface.co/galkowskim/roberta_base_QA_SQUAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_qa_ai_club_inductions_21_nlp_roBERTa_base_squad_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_qa_ai_club_inductions_21_nlp_roBERTa_base_squad_v2_pipeline_en.md new file mode 100644 index 00000000000000..3a27fbb9fed259 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_qa_ai_club_inductions_21_nlp_roBERTa_base_squad_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_qa_ai_club_inductions_21_nlp_roBERTa_base_squad_v2_pipeline pipeline RoBertaForQuestionAnswering from AyushPJ +author: John Snow Labs +name: roberta_qa_ai_club_inductions_21_nlp_roBERTa_base_squad_v2_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_qa_ai_club_inductions_21_nlp_roBERTa_base_squad_v2_pipeline` is a English model originally trained by AyushPJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_ai_club_inductions_21_nlp_roBERTa_base_squad_v2_pipeline_en_5.5.0_3.0_1726175883118.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_ai_club_inductions_21_nlp_roBERTa_base_squad_v2_pipeline_en_5.5.0_3.0_1726175883118.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_qa_ai_club_inductions_21_nlp_roBERTa_base_squad_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_qa_ai_club_inductions_21_nlp_roBERTa_base_squad_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_ai_club_inductions_21_nlp_roBERTa_base_squad_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.6 MB| + +## References + +https://huggingface.co/AyushPJ/ai-club-inductions-21-nlp-roBERTa-base-squad-v2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-romanized_malayalam_en.md b/docs/_posts/ahmedlone127/2024-09-12-romanized_malayalam_en.md new file mode 100644 index 00000000000000..df7b7dcefc078a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-romanized_malayalam_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English romanized_malayalam RoBertaEmbeddings from Jitin +author: John Snow Labs +name: romanized_malayalam +date: 2024-09-12 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`romanized_malayalam` is a English model originally trained by Jitin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/romanized_malayalam_en_5.5.0_3.0_1726113353648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/romanized_malayalam_en_5.5.0_3.0_1726113353648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("romanized_malayalam","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("romanized_malayalam","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|romanized_malayalam| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.6 MB| + +## References + +https://huggingface.co/Jitin/romanized-malayalam \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-rulebert_v0_2_k0_it.md b/docs/_posts/ahmedlone127/2024-09-12-rulebert_v0_2_k0_it.md new file mode 100644 index 00000000000000..860f666b75ef87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-rulebert_v0_2_k0_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian rulebert_v0_2_k0 XlmRoBertaForSequenceClassification from ribesstefano +author: John Snow Labs +name: rulebert_v0_2_k0 +date: 2024-09-12 +tags: [it, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rulebert_v0_2_k0` is a Italian model originally trained by ribesstefano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rulebert_v0_2_k0_it_5.5.0_3.0_1726146337467.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rulebert_v0_2_k0_it_5.5.0_3.0_1726146337467.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("rulebert_v0_2_k0","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("rulebert_v0_2_k0", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rulebert_v0_2_k0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|it| +|Size:|870.4 MB| + +## References + +https://huggingface.co/ribesstefano/RuleBert-v0.2-k0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-sarcasm_detection_using_bert_en.md b/docs/_posts/ahmedlone127/2024-09-12-sarcasm_detection_using_bert_en.md new file mode 100644 index 00000000000000..3fec5d362da55e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-sarcasm_detection_using_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sarcasm_detection_using_bert BertForSequenceClassification from nikesh66 +author: John Snow Labs +name: sarcasm_detection_using_bert +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sarcasm_detection_using_bert` is a English model originally trained by nikesh66. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sarcasm_detection_using_bert_en_5.5.0_3.0_1726122868085.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sarcasm_detection_using_bert_en_5.5.0_3.0_1726122868085.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sarcasm_detection_using_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sarcasm_detection_using_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sarcasm_detection_using_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/nikesh66/Sarcasm-Detection-using-BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-second_try_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-second_try_pipeline_en.md new file mode 100644 index 00000000000000..ef6c8a48eddf7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-second_try_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English second_try_pipeline pipeline RoBertaForQuestionAnswering from Jose-Ribeir +author: John Snow Labs +name: second_try_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`second_try_pipeline` is a English model originally trained by Jose-Ribeir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/second_try_pipeline_en_5.5.0_3.0_1726175746785.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/second_try_pipeline_en_5.5.0_3.0_1726175746785.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("second_try_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("second_try_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|second_try_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/Jose-Ribeir/second_try + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-sent_malayalam_bert_ml.md b/docs/_posts/ahmedlone127/2024-09-12-sent_malayalam_bert_ml.md new file mode 100644 index 00000000000000..d4457d325252ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-sent_malayalam_bert_ml.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Malayalam sent_malayalam_bert BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_malayalam_bert +date: 2024-09-12 +tags: [ml, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ml +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_malayalam_bert` is a Malayalam model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_malayalam_bert_ml_5.5.0_3.0_1726119505401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_malayalam_bert_ml_5.5.0_3.0_1726119505401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_malayalam_bert","ml") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_malayalam_bert","ml") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_malayalam_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ml| +|Size:|890.5 MB| + +## References + +https://huggingface.co/l3cube-pune/malayalam-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-spamai_en.md b/docs/_posts/ahmedlone127/2024-09-12-spamai_en.md new file mode 100644 index 00000000000000..b46d1117e8fab2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-spamai_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spamai BertForSequenceClassification from cybert79 +author: John Snow Labs +name: spamai +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spamai` is a English model originally trained by cybert79. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spamai_en_5.5.0_3.0_1726123184786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spamai_en_5.5.0_3.0_1726123184786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("spamai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("spamai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spamai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cybert79/spamai \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-spanbert_large_cased_finetuned_ner_conll2003_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-spanbert_large_cased_finetuned_ner_conll2003_pipeline_en.md new file mode 100644 index 00000000000000..a31961a2a398de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-spanbert_large_cased_finetuned_ner_conll2003_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spanbert_large_cased_finetuned_ner_conll2003_pipeline pipeline BertForTokenClassification from MKR-AI +author: John Snow Labs +name: spanbert_large_cased_finetuned_ner_conll2003_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanbert_large_cased_finetuned_ner_conll2003_pipeline` is a English model originally trained by MKR-AI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanbert_large_cased_finetuned_ner_conll2003_pipeline_en_5.5.0_3.0_1726155236584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanbert_large_cased_finetuned_ner_conll2003_pipeline_en_5.5.0_3.0_1726155236584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spanbert_large_cased_finetuned_ner_conll2003_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spanbert_large_cased_finetuned_ner_conll2003_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanbert_large_cased_finetuned_ner_conll2003_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/MKR-AI/spanbert-large-cased-finetuned-ner-conll2003 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-squad_clip_text_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-squad_clip_text_1_pipeline_en.md new file mode 100644 index 00000000000000..efd611297fb8dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-squad_clip_text_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English squad_clip_text_1_pipeline pipeline RoBertaForQuestionAnswering from AnonymousSub +author: John Snow Labs +name: squad_clip_text_1_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squad_clip_text_1_pipeline` is a English model originally trained by AnonymousSub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squad_clip_text_1_pipeline_en_5.5.0_3.0_1726175351559.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squad_clip_text_1_pipeline_en_5.5.0_3.0_1726175351559.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("squad_clip_text_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("squad_clip_text_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squad_clip_text_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/AnonymousSub/SQuAD_CLIP_text_1 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-t2_en.md b/docs/_posts/ahmedlone127/2024-09-12-t2_en.md new file mode 100644 index 00000000000000..3f468648d8c002 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-t2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English t2 RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t2 +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t2` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t2_en_5.5.0_3.0_1726118040103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t2_en_5.5.0_3.0_1726118040103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("t2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("t2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-trading_hero_llm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-trading_hero_llm_pipeline_en.md new file mode 100644 index 00000000000000..84754c4c3db716 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-trading_hero_llm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trading_hero_llm_pipeline pipeline BertForSequenceClassification from fuchenru +author: John Snow Labs +name: trading_hero_llm_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trading_hero_llm_pipeline` is a English model originally trained by fuchenru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trading_hero_llm_pipeline_en_5.5.0_3.0_1726182218035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trading_hero_llm_pipeline_en_5.5.0_3.0_1726182218035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trading_hero_llm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trading_hero_llm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trading_hero_llm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/fuchenru/Trading-Hero-LLM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-twitter_roberta_base_tempo_wic_latest_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-twitter_roberta_base_tempo_wic_latest_pipeline_en.md new file mode 100644 index 00000000000000..275a833047e259 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-twitter_roberta_base_tempo_wic_latest_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_tempo_wic_latest_pipeline pipeline RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_tempo_wic_latest_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_tempo_wic_latest_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_tempo_wic_latest_pipeline_en_5.5.0_3.0_1726166086453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_tempo_wic_latest_pipeline_en_5.5.0_3.0_1726166086453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_tempo_wic_latest_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_tempo_wic_latest_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_tempo_wic_latest_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base-tempo-wic-latest + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-twitter_roberta_large_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-12-twitter_roberta_large_2022_154m_en.md new file mode 100644 index 00000000000000..f4ed9856ff09f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-twitter_roberta_large_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_roberta_large_2022_154m RoBertaEmbeddings from cardiffnlp +author: John Snow Labs +name: twitter_roberta_large_2022_154m +date: 2024-09-12 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_large_2022_154m` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_large_2022_154m_en_5.5.0_3.0_1726185269023.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_large_2022_154m_en_5.5.0_3.0_1726185269023.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("twitter_roberta_large_2022_154m","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("twitter_roberta_large_2022_154m","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_large_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-large-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-twitter_xlm_roberta_bitcoin_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-twitter_xlm_roberta_bitcoin_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..7a360dd6deff2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-twitter_xlm_roberta_bitcoin_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_xlm_roberta_bitcoin_sentiment_pipeline pipeline XlmRoBertaForSequenceClassification from svalabs +author: John Snow Labs +name: twitter_xlm_roberta_bitcoin_sentiment_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_xlm_roberta_bitcoin_sentiment_pipeline` is a English model originally trained by svalabs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_xlm_roberta_bitcoin_sentiment_pipeline_en_5.5.0_3.0_1726146644905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_xlm_roberta_bitcoin_sentiment_pipeline_en_5.5.0_3.0_1726146644905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_xlm_roberta_bitcoin_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_xlm_roberta_bitcoin_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_xlm_roberta_bitcoin_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/svalabs/twitter-xlm-roberta-bitcoin-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-uzbek_syn_roberta_pipeline_uz.md b/docs/_posts/ahmedlone127/2024-09-12-uzbek_syn_roberta_pipeline_uz.md new file mode 100644 index 00000000000000..6fc1e3dd579f74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-uzbek_syn_roberta_pipeline_uz.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Uzbek uzbek_syn_roberta_pipeline pipeline RoBertaEmbeddings from Mansurbek +author: John Snow Labs +name: uzbek_syn_roberta_pipeline +date: 2024-09-12 +tags: [uz, open_source, pipeline, onnx] +task: Embeddings +language: uz +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uzbek_syn_roberta_pipeline` is a Uzbek model originally trained by Mansurbek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uzbek_syn_roberta_pipeline_uz_5.5.0_3.0_1726185459136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uzbek_syn_roberta_pipeline_uz_5.5.0_3.0_1726185459136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("uzbek_syn_roberta_pipeline", lang = "uz") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("uzbek_syn_roberta_pipeline", lang = "uz") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uzbek_syn_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|uz| +|Size:|471.0 MB| + +## References + +https://huggingface.co/Mansurbek/uz-syn-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-whisper_small_cv17_hungarian_hu.md b/docs/_posts/ahmedlone127/2024-09-12-whisper_small_cv17_hungarian_hu.md new file mode 100644 index 00000000000000..c464240c44cd58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-whisper_small_cv17_hungarian_hu.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hungarian whisper_small_cv17_hungarian WhisperForCTC from Hungarians +author: John Snow Labs +name: whisper_small_cv17_hungarian +date: 2024-09-12 +tags: [hu, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cv17_hungarian` is a Hungarian model originally trained by Hungarians. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cv17_hungarian_hu_5.5.0_3.0_1726137589190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cv17_hungarian_hu_5.5.0_3.0_1726137589190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_cv17_hungarian","hu") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_cv17_hungarian", "hu") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cv17_hungarian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Hungarians/whisper-small-cv17-hu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-whisper_small_naija_9_7_2024_pipeline_yo.md b/docs/_posts/ahmedlone127/2024-09-12-whisper_small_naija_9_7_2024_pipeline_yo.md new file mode 100644 index 00000000000000..81afadd4af4877 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-whisper_small_naija_9_7_2024_pipeline_yo.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Yoruba whisper_small_naija_9_7_2024_pipeline pipeline WhisperForCTC from okezieowen +author: John Snow Labs +name: whisper_small_naija_9_7_2024_pipeline +date: 2024-09-12 +tags: [yo, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: yo +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_naija_9_7_2024_pipeline` is a Yoruba model originally trained by okezieowen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_naija_9_7_2024_pipeline_yo_5.5.0_3.0_1726151007622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_naija_9_7_2024_pipeline_yo_5.5.0_3.0_1726151007622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_naija_9_7_2024_pipeline", lang = "yo") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_naija_9_7_2024_pipeline", lang = "yo") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_naija_9_7_2024_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|yo| +|Size:|1.7 GB| + +## References + +https://huggingface.co/okezieowen/whisper-small-naija-9-7-2024 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_all_hirosay_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_all_hirosay_pipeline_en.md new file mode 100644 index 00000000000000..aa00228157226f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_all_hirosay_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_hirosay_pipeline pipeline XlmRoBertaForTokenClassification from hirosay +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_hirosay_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_hirosay_pipeline` is a English model originally trained by hirosay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hirosay_pipeline_en_5.5.0_3.0_1726116554100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hirosay_pipeline_en_5.5.0_3.0_1726116554100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_hirosay_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_hirosay_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_hirosay_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/hirosay/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_all_omersubasi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_all_omersubasi_pipeline_en.md new file mode 100644 index 00000000000000..9e8f8b55180798 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_all_omersubasi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_omersubasi_pipeline pipeline XlmRoBertaForTokenClassification from omersubasi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_omersubasi_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_omersubasi_pipeline` is a English model originally trained by omersubasi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_omersubasi_pipeline_en_5.5.0_3.0_1726158896919.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_omersubasi_pipeline_en_5.5.0_3.0_1726158896919.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_omersubasi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_omersubasi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_omersubasi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|859.8 MB| + +## References + +https://huggingface.co/omersubasi/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_english_hcy5561_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_english_hcy5561_en.md new file mode 100644 index 00000000000000..b11dda08456ffc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_english_hcy5561_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_hcy5561 XlmRoBertaForTokenClassification from hcy5561 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_hcy5561 +date: 2024-09-12 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_hcy5561` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_hcy5561_en_5.5.0_3.0_1726157472950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_hcy5561_en_5.5.0_3.0_1726157472950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_hcy5561","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_hcy5561", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_hcy5561| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/hcy5561/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_english_smilingface88_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_english_smilingface88_en.md new file mode 100644 index 00000000000000..715542e6006a17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_english_smilingface88_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_smilingface88 XlmRoBertaForTokenClassification from smilingface88 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_smilingface88 +date: 2024-09-12 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_smilingface88` is a English model originally trained by smilingface88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_smilingface88_en_5.5.0_3.0_1726158972296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_smilingface88_en_5.5.0_3.0_1726158972296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_smilingface88","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_smilingface88", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_smilingface88| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/smilingface88/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_french_jbreunig_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_french_jbreunig_pipeline_en.md new file mode 100644 index 00000000000000..bd7e25c3de06d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_french_jbreunig_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jbreunig_pipeline pipeline XlmRoBertaForTokenClassification from jbreunig +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jbreunig_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jbreunig_pipeline` is a English model originally trained by jbreunig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jbreunig_pipeline_en_5.5.0_3.0_1726130889423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jbreunig_pipeline_en_5.5.0_3.0_1726130889423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jbreunig_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jbreunig_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jbreunig_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/jbreunig/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_french_obong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_french_obong_pipeline_en.md new file mode 100644 index 00000000000000..283ee793d9a7cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_french_obong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_obong_pipeline pipeline XlmRoBertaForTokenClassification from obong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_obong_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_obong_pipeline` is a English model originally trained by obong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_obong_pipeline_en_5.5.0_3.0_1726164425817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_obong_pipeline_en_5.5.0_3.0_1726164425817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_obong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_obong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_obong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/obong/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_pstary_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_pstary_en.md new file mode 100644 index 00000000000000..b0619e0582a365 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_pstary_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_pstary XlmRoBertaForTokenClassification from Pstary +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_pstary +date: 2024-09-12 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_pstary` is a English model originally trained by Pstary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_pstary_en_5.5.0_3.0_1726160220992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_pstary_en_5.5.0_3.0_1726160220992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_pstary","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_pstary", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_pstary| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Pstary/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_r45289_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_r45289_en.md new file mode 100644 index 00000000000000..983c8c03df8b8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_r45289_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_r45289 XlmRoBertaForTokenClassification from r45289 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_r45289 +date: 2024-09-12 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_r45289` is a English model originally trained by r45289. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_r45289_en_5.5.0_3.0_1726116427686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_r45289_en_5.5.0_3.0_1726116427686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_r45289","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_r45289", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_r45289| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/r45289/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_r45289_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_r45289_pipeline_en.md new file mode 100644 index 00000000000000..52101a5d2333f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_r45289_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_r45289_pipeline pipeline XlmRoBertaForTokenClassification from r45289 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_r45289_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_r45289_pipeline` is a English model originally trained by r45289. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_r45289_pipeline_en_5.5.0_3.0_1726116492536.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_r45289_pipeline_en_5.5.0_3.0_1726116492536.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_r45289_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_r45289_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_r45289_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/r45289/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_nepal_bhasa_vietnam_aug_replace_synonym_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_nepal_bhasa_vietnam_aug_replace_synonym_1_pipeline_en.md new file mode 100644 index 00000000000000..e98b05f5776580 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_nepal_bhasa_vietnam_aug_replace_synonym_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_nepal_bhasa_vietnam_aug_replace_synonym_1_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_nepal_bhasa_vietnam_aug_replace_synonym_1_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_nepal_bhasa_vietnam_aug_replace_synonym_1_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_replace_synonym_1_pipeline_en_5.5.0_3.0_1726146732521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_replace_synonym_1_pipeline_en_5.5.0_3.0_1726146732521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_nepal_bhasa_vietnam_aug_replace_synonym_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_nepal_bhasa_vietnam_aug_replace_synonym_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_nepal_bhasa_vietnam_aug_replace_synonym_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|799.4 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-New_VietNam-aug_replace_synonym-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-all_roberta_large_v1_travel_3_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-13-all_roberta_large_v1_travel_3_16_5_en.md new file mode 100644 index 00000000000000..b9105b6074a31b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-all_roberta_large_v1_travel_3_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_travel_3_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_travel_3_16_5 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_travel_3_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_3_16_5_en_5.5.0_3.0_1726187155095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_3_16_5_en_5.5.0_3.0_1726187155095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_travel_3_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_travel_3_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_travel_3_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-travel-3-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-alpha_bert_en.md b/docs/_posts/ahmedlone127/2024-09-13-alpha_bert_en.md new file mode 100644 index 00000000000000..a6127b41b27b9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-alpha_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English alpha_bert BertForSequenceClassification from Luciferalive +author: John Snow Labs +name: alpha_bert +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alpha_bert` is a English model originally trained by Luciferalive. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alpha_bert_en_5.5.0_3.0_1726201294081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alpha_bert_en_5.5.0_3.0_1726201294081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("alpha_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("alpha_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alpha_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Luciferalive/Alpha_Bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-another_local_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-13-another_local_pipeline_it.md new file mode 100644 index 00000000000000..0e3c079e230fe4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-another_local_pipeline_it.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Italian another_local_pipeline pipeline WhisperForCTC from SaladSlayer00 +author: John Snow Labs +name: another_local_pipeline +date: 2024-09-13 +tags: [it, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`another_local_pipeline` is a Italian model originally trained by SaladSlayer00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/another_local_pipeline_it_5.5.0_3.0_1726220182160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/another_local_pipeline_it_5.5.0_3.0_1726220182160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("another_local_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("another_local_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|another_local_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|1.7 GB| + +## References + +https://huggingface.co/SaladSlayer00/another_local + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_seed3_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-13-babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_seed3_finetuned_squad_en.md new file mode 100644 index 00000000000000..0545906e89f6f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_seed3_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_seed3_finetuned_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_seed3_finetuned_squad +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_seed3_finetuned_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_seed3_finetuned_squad_en_5.5.0_3.0_1726199032052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_seed3_finetuned_squad_en_5.5.0_3.0_1726199032052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_seed3_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_seed3_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_1_25m_aochildes_french1_25m_with_masking_seed3_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|31.9 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes_1.25M_aochildes-french1.25M-with-Masking-seed3-finetuned-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..abe31e5671549f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_squad_pipeline pipeline RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_squad_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_squad_pipeline` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_squad_pipeline_en_5.5.0_3.0_1726198578743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_squad_pipeline_en_5.5.0_3.0_1726198578743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_wikipedia1_1_25m_wikipedia_french1_25m_with_masking_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-wikipedia1_1.25M_wikipedia_french1.25M-with-Masking-finetuned-SQuAD + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-bert_base_multilingual_uncased_finetuned_ner_geocorpus_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-13-bert_base_multilingual_uncased_finetuned_ner_geocorpus_pipeline_xx.md new file mode 100644 index 00000000000000..8c9033f279710b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-bert_base_multilingual_uncased_finetuned_ner_geocorpus_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_finetuned_ner_geocorpus_pipeline pipeline BertForTokenClassification from GuiTap +author: John Snow Labs +name: bert_base_multilingual_uncased_finetuned_ner_geocorpus_pipeline +date: 2024-09-13 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_finetuned_ner_geocorpus_pipeline` is a Multilingual model originally trained by GuiTap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_ner_geocorpus_pipeline_xx_5.5.0_3.0_1726267484407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_ner_geocorpus_pipeline_xx_5.5.0_3.0_1726267484407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_finetuned_ner_geocorpus_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_finetuned_ner_geocorpus_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_finetuned_ner_geocorpus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|625.6 MB| + +## References + +https://huggingface.co/GuiTap/bert-base-multilingual-uncased-finetuned-ner-geocorpus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-bert_large_uncased_sst2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-bert_large_uncased_sst2_pipeline_en.md new file mode 100644 index 00000000000000..47db4fd843d65a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-bert_large_uncased_sst2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_uncased_sst2_pipeline pipeline BertForSequenceClassification from yoshitomo-matsubara +author: John Snow Labs +name: bert_large_uncased_sst2_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_sst2_pipeline` is a English model originally trained by yoshitomo-matsubara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_sst2_pipeline_en_5.5.0_3.0_1726201457369.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_sst2_pipeline_en_5.5.0_3.0_1726201457369.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_sst2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_sst2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_sst2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-bert_vllm_gemma2b_7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-bert_vllm_gemma2b_7_pipeline_en.md new file mode 100644 index 00000000000000..1b8ca9effc2f73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-bert_vllm_gemma2b_7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_vllm_gemma2b_7_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_7_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_7_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_7_pipeline_en_5.5.0_3.0_1726262673270.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_7_pipeline_en_5.5.0_3.0_1726262673270.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_vllm_gemma2b_7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_vllm_gemma2b_7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b_7 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-brwac_v1_4__checkpoint12_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-brwac_v1_4__checkpoint12_pipeline_en.md new file mode 100644 index 00000000000000..70daa3c3d42e1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-brwac_v1_4__checkpoint12_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English brwac_v1_4__checkpoint12_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_4__checkpoint12_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_4__checkpoint12_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_4__checkpoint12_pipeline_en_5.5.0_3.0_1726197335882.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_4__checkpoint12_pipeline_en_5.5.0_3.0_1726197335882.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("brwac_v1_4__checkpoint12_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("brwac_v1_4__checkpoint12_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_4__checkpoint12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.9 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_4__checkpoint12 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_djhenny_en.md b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_djhenny_en.md new file mode 100644 index 00000000000000..18b5a4e4a4afe2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_djhenny_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_djhenny DistilBertForQuestionAnswering from djhenny +author: John Snow Labs +name: burmese_awesome_qa_model_djhenny +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_djhenny` is a English model originally trained by djhenny. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_djhenny_en_5.5.0_3.0_1726245067308.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_djhenny_en_5.5.0_3.0_1726245067308.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_djhenny","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_djhenny", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_djhenny| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/djhenny/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_maniack_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_maniack_pipeline_en.md new file mode 100644 index 00000000000000..f5f7b3e9a8335d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_maniack_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_maniack_pipeline pipeline DistilBertForQuestionAnswering from maniack +author: John Snow Labs +name: burmese_awesome_qa_model_maniack_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_maniack_pipeline` is a English model originally trained by maniack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_maniack_pipeline_en_5.5.0_3.0_1726245525137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_maniack_pipeline_en_5.5.0_3.0_1726245525137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_maniack_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_maniack_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_maniack_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/maniack/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-culturebank_controversial_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-13-culturebank_controversial_classifier_en.md new file mode 100644 index 00000000000000..993cde6a963e59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-culturebank_controversial_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English culturebank_controversial_classifier DistilBertForSequenceClassification from SALT-NLP +author: John Snow Labs +name: culturebank_controversial_classifier +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`culturebank_controversial_classifier` is a English model originally trained by SALT-NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/culturebank_controversial_classifier_en_5.5.0_3.0_1726262006860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/culturebank_controversial_classifier_en_5.5.0_3.0_1726262006860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("culturebank_controversial_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("culturebank_controversial_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|culturebank_controversial_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SALT-NLP/CultureBank-Controversial-Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-deberta_em_en.md b/docs/_posts/ahmedlone127/2024-09-13-deberta_em_en.md new file mode 100644 index 00000000000000..772ac3fe4cc92e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-deberta_em_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_em DeBertaForSequenceClassification from keremp +author: John Snow Labs +name: deberta_em +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_em` is a English model originally trained by keremp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_em_en_5.5.0_3.0_1726190724584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_em_en_5.5.0_3.0_1726190724584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_em","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_em", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_em| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|562.7 MB| + +## References + +https://huggingface.co/keremp/deberta-em \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-deberta_hallucination_eval_en.md b/docs/_posts/ahmedlone127/2024-09-13-deberta_hallucination_eval_en.md new file mode 100644 index 00000000000000..cbdc09fb647b32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-deberta_hallucination_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_hallucination_eval DeBertaForSequenceClassification from mathislucka +author: John Snow Labs +name: deberta_hallucination_eval +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_hallucination_eval` is a English model originally trained by mathislucka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_hallucination_eval_en_5.5.0_3.0_1726200231054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_hallucination_eval_en_5.5.0_3.0_1726200231054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_hallucination_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_hallucination_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_hallucination_eval| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|690.6 MB| + +## References + +https://huggingface.co/mathislucka/deberta-hallucination-eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_base_dialoconan_cls_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_base_dialoconan_cls_pipeline_en.md new file mode 100644 index 00000000000000..f198ad272e3a1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_base_dialoconan_cls_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_base_dialoconan_cls_pipeline pipeline DeBertaForSequenceClassification from xoyeop +author: John Snow Labs +name: deberta_v3_base_dialoconan_cls_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_base_dialoconan_cls_pipeline` is a English model originally trained by xoyeop. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_base_dialoconan_cls_pipeline_en_5.5.0_3.0_1726260318358.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_base_dialoconan_cls_pipeline_en_5.5.0_3.0_1726260318358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_base_dialoconan_cls_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_base_dialoconan_cls_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_base_dialoconan_cls_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|589.6 MB| + +## References + +https://huggingface.co/xoyeop/deberta-v3-base-DIALOCONAN-CLS + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_base_finetuned_fever_en.md b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_base_finetuned_fever_en.md new file mode 100644 index 00000000000000..3713792113fdd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_base_finetuned_fever_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_base_finetuned_fever DeBertaForSequenceClassification from learn3r +author: John Snow Labs +name: deberta_v3_base_finetuned_fever +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_base_finetuned_fever` is a English model originally trained by learn3r. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_base_finetuned_fever_en_5.5.0_3.0_1726199967175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_base_finetuned_fever_en_5.5.0_3.0_1726199967175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_base_finetuned_fever","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_base_finetuned_fever", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_base_finetuned_fever| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|627.5 MB| + +## References + +https://huggingface.co/learn3r/deberta-v3-base-finetuned-fever \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_pipeline_en.md new file mode 100644 index 00000000000000..745966576869a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_pipeline pipeline DeBertaForSequenceClassification from domenicrosati +author: John Snow Labs +name: deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_pipeline` is a English model originally trained by domenicrosati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_pipeline_en_5.5.0_3.0_1726199879491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_pipeline_en_5.5.0_3.0_1726199879491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/domenicrosati/deberta-v3-large-survey-related_passage_consistency-rater-half-gpt4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_small_tweet_about_disaster_oriya_not_en.md b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_small_tweet_about_disaster_oriya_not_en.md new file mode 100644 index 00000000000000..9fae70d2ce67aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_small_tweet_about_disaster_oriya_not_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_small_tweet_about_disaster_oriya_not DeBertaForSequenceClassification from DunnBC22 +author: John Snow Labs +name: deberta_v3_small_tweet_about_disaster_oriya_not +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_small_tweet_about_disaster_oriya_not` is a English model originally trained by DunnBC22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_small_tweet_about_disaster_oriya_not_en_5.5.0_3.0_1726200190933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_small_tweet_about_disaster_oriya_not_en_5.5.0_3.0_1726200190933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_small_tweet_about_disaster_oriya_not","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_small_tweet_about_disaster_oriya_not", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_small_tweet_about_disaster_oriya_not| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|448.4 MB| + +## References + +https://huggingface.co/DunnBC22/deberta-v3-small-Tweet_About_Disaster_Or_Not \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-dictabert_finetuned_dss_maskedlm_en.md b/docs/_posts/ahmedlone127/2024-09-13-dictabert_finetuned_dss_maskedlm_en.md new file mode 100644 index 00000000000000..b48481fa7ae557 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-dictabert_finetuned_dss_maskedlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dictabert_finetuned_dss_maskedlm BertEmbeddings from yonatanlou +author: John Snow Labs +name: dictabert_finetuned_dss_maskedlm +date: 2024-09-13 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dictabert_finetuned_dss_maskedlm` is a English model originally trained by yonatanlou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dictabert_finetuned_dss_maskedlm_en_5.5.0_3.0_1726234423078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dictabert_finetuned_dss_maskedlm_en_5.5.0_3.0_1726234423078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("dictabert_finetuned_dss_maskedlm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("dictabert_finetuned_dss_maskedlm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dictabert_finetuned_dss_maskedlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|689.1 MB| + +## References + +https://huggingface.co/yonatanlou/dictabert-finetuned-DSS-maskedLM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_multilingual_cased_fine_ptbr_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_multilingual_cased_fine_ptbr_pipeline_xx.md new file mode 100644 index 00000000000000..99491ca3005361 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_multilingual_cased_fine_ptbr_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_fine_ptbr_pipeline pipeline DistilBertForSequenceClassification from Laysson +author: John Snow Labs +name: distilbert_base_multilingual_cased_fine_ptbr_pipeline +date: 2024-09-13 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_fine_ptbr_pipeline` is a Multilingual model originally trained by Laysson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_fine_ptbr_pipeline_xx_5.5.0_3.0_1726262918619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_fine_ptbr_pipeline_xx_5.5.0_3.0_1726262918619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_fine_ptbr_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_fine_ptbr_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_fine_ptbr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/Laysson/distilbert-base-multilingual-cased-fine-ptbr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_abdelwahab201_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_abdelwahab201_pipeline_en.md new file mode 100644 index 00000000000000..b5165baf8848b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_abdelwahab201_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_abdelwahab201_pipeline pipeline DistilBertForSequenceClassification from Abdelwahab201 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_abdelwahab201_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_abdelwahab201_pipeline` is a English model originally trained by Abdelwahab201. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_abdelwahab201_pipeline_en_5.5.0_3.0_1726262470362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_abdelwahab201_pipeline_en_5.5.0_3.0_1726262470362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_abdelwahab201_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_abdelwahab201_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_abdelwahab201_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Abdelwahab201/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_admin_0_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_admin_0_en.md new file mode 100644 index 00000000000000..e1d5127d649fa7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_admin_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_admin_0 DistilBertForSequenceClassification from admin-0 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_admin_0 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_admin_0` is a English model originally trained by admin-0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_admin_0_en_5.5.0_3.0_1726262310336.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_admin_0_en_5.5.0_3.0_1726262310336.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_admin_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_admin_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_admin_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/admin-0/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_esperesa_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_esperesa_en.md new file mode 100644 index 00000000000000..1d4bbb121f6f51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_esperesa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_esperesa DistilBertForSequenceClassification from esperesa +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_esperesa +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_esperesa` is a English model originally trained by esperesa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_esperesa_en_5.5.0_3.0_1726262310680.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_esperesa_en_5.5.0_3.0_1726262310680.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_esperesa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_esperesa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_esperesa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/esperesa/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_nli276_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_nli276_en.md new file mode 100644 index 00000000000000..6b7c70280a3d17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_nli276_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_nli276 DistilBertForSequenceClassification from nli276 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_nli276 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_nli276` is a English model originally trained by nli276. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nli276_en_5.5.0_3.0_1726262784136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nli276_en_5.5.0_3.0_1726262784136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_nli276","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_nli276", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_nli276| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nli276/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_rizwan_ai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_rizwan_ai_pipeline_en.md new file mode 100644 index 00000000000000..8843c869fa1608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_rizwan_ai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_rizwan_ai_pipeline pipeline DistilBertForSequenceClassification from rizwan-ai +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_rizwan_ai_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_rizwan_ai_pipeline` is a English model originally trained by rizwan-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rizwan_ai_pipeline_en_5.5.0_3.0_1726262286585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rizwan_ai_pipeline_en_5.5.0_3.0_1726262286585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_rizwan_ai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_rizwan_ai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_rizwan_ai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rizwan-ai/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_xiumu1988_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_xiumu1988_en.md new file mode 100644 index 00000000000000..31638e50003135 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_xiumu1988_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_xiumu1988 DistilBertForSequenceClassification from xiumu1988 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_xiumu1988 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_xiumu1988` is a English model originally trained by xiumu1988. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_xiumu1988_en_5.5.0_3.0_1726242468281.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_xiumu1988_en_5.5.0_3.0_1726242468281.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_xiumu1988","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_xiumu1988", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_xiumu1988| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/xiumu1988/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_qa_model_smrynrz0220_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_qa_model_smrynrz0220_pipeline_en.md new file mode 100644 index 00000000000000..679aafa710b8cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_qa_model_smrynrz0220_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_qa_model_smrynrz0220_pipeline pipeline DistilBertForQuestionAnswering from smrynrz0220 +author: John Snow Labs +name: distilbert_qa_model_smrynrz0220_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_qa_model_smrynrz0220_pipeline` is a English model originally trained by smrynrz0220. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_qa_model_smrynrz0220_pipeline_en_5.5.0_3.0_1726244970619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_qa_model_smrynrz0220_pipeline_en_5.5.0_3.0_1726244970619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_qa_model_smrynrz0220_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_qa_model_smrynrz0220_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_qa_model_smrynrz0220_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/smrynrz0220/distilbert_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-dopamin_python_summary_en.md b/docs/_posts/ahmedlone127/2024-09-13-dopamin_python_summary_en.md new file mode 100644 index 00000000000000..5d845c74dbfc0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-dopamin_python_summary_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dopamin_python_summary RoBertaForSequenceClassification from Fsoft-AIC +author: John Snow Labs +name: dopamin_python_summary +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dopamin_python_summary` is a English model originally trained by Fsoft-AIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dopamin_python_summary_en_5.5.0_3.0_1726186956446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dopamin_python_summary_en_5.5.0_3.0_1726186956446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("dopamin_python_summary","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("dopamin_python_summary", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dopamin_python_summary| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/Fsoft-AIC/dopamin-python-summary \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-econberta_en.md b/docs/_posts/ahmedlone127/2024-09-13-econberta_en.md new file mode 100644 index 00000000000000..43ec63ed4c1316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-econberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English econberta RoBertaEmbeddings from ThePixOne +author: John Snow Labs +name: econberta +date: 2024-09-13 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`econberta` is a English model originally trained by ThePixOne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/econberta_en_5.5.0_3.0_1726264616807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/econberta_en_5.5.0_3.0_1726264616807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("econberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("econberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|econberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/ThePixOne/EconBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-financial_sentiment_model_2500_samples_fine_tune_en.md b/docs/_posts/ahmedlone127/2024-09-13-financial_sentiment_model_2500_samples_fine_tune_en.md new file mode 100644 index 00000000000000..a35d05570b7ebe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-financial_sentiment_model_2500_samples_fine_tune_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English financial_sentiment_model_2500_samples_fine_tune DistilBertForSequenceClassification from kevinwlip +author: John Snow Labs +name: financial_sentiment_model_2500_samples_fine_tune +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_sentiment_model_2500_samples_fine_tune` is a English model originally trained by kevinwlip. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_sentiment_model_2500_samples_fine_tune_en_5.5.0_3.0_1726262654591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_sentiment_model_2500_samples_fine_tune_en_5.5.0_3.0_1726262654591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("financial_sentiment_model_2500_samples_fine_tune","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("financial_sentiment_model_2500_samples_fine_tune", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_sentiment_model_2500_samples_fine_tune| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kevinwlip/financial-sentiment-model-2500-samples-fine-tune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-fine_tuned_distilbert_base_uncased_beehzod_en.md b/docs/_posts/ahmedlone127/2024-09-13-fine_tuned_distilbert_base_uncased_beehzod_en.md new file mode 100644 index 00000000000000..4356cbb2a06749 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-fine_tuned_distilbert_base_uncased_beehzod_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuned_distilbert_base_uncased_beehzod DistilBertForSequenceClassification from Beehzod +author: John Snow Labs +name: fine_tuned_distilbert_base_uncased_beehzod +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_distilbert_base_uncased_beehzod` is a English model originally trained by Beehzod. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_base_uncased_beehzod_en_5.5.0_3.0_1726262127150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_base_uncased_beehzod_en_5.5.0_3.0_1726262127150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_distilbert_base_uncased_beehzod","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_distilbert_base_uncased_beehzod", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_distilbert_base_uncased_beehzod| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Beehzod/fine_tuned_distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-finetuned_model_on_custom_dataset_700k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-finetuned_model_on_custom_dataset_700k_pipeline_en.md new file mode 100644 index 00000000000000..9cd651ec21b473 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-finetuned_model_on_custom_dataset_700k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_model_on_custom_dataset_700k_pipeline pipeline DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: finetuned_model_on_custom_dataset_700k_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_model_on_custom_dataset_700k_pipeline` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_model_on_custom_dataset_700k_pipeline_en_5.5.0_3.0_1726242888744.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_model_on_custom_dataset_700k_pipeline_en_5.5.0_3.0_1726242888744.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_model_on_custom_dataset_700k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_model_on_custom_dataset_700k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_model_on_custom_dataset_700k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Finetuned-model-on-custom-dataset-700k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-finetuning_sentiment_model_3000_samples_vaishnavi514_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-finetuning_sentiment_model_3000_samples_vaishnavi514_pipeline_en.md new file mode 100644 index 00000000000000..0b41675f2ee62b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-finetuning_sentiment_model_3000_samples_vaishnavi514_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_vaishnavi514_pipeline pipeline DistilBertForSequenceClassification from vaishnavi514 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_vaishnavi514_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_vaishnavi514_pipeline` is a English model originally trained by vaishnavi514. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_vaishnavi514_pipeline_en_5.5.0_3.0_1726242686871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_vaishnavi514_pipeline_en_5.5.0_3.0_1726242686871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_vaishnavi514_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_vaishnavi514_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_vaishnavi514_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vaishnavi514/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-first_trained_trainer_finetuned_1_en.md b/docs/_posts/ahmedlone127/2024-09-13-first_trained_trainer_finetuned_1_en.md new file mode 100644 index 00000000000000..53ef81951c97ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-first_trained_trainer_finetuned_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English first_trained_trainer_finetuned_1 RoBertaForQuestionAnswering from tshamardina +author: John Snow Labs +name: first_trained_trainer_finetuned_1 +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`first_trained_trainer_finetuned_1` is a English model originally trained by tshamardina. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/first_trained_trainer_finetuned_1_en_5.5.0_3.0_1726206605117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/first_trained_trainer_finetuned_1_en_5.5.0_3.0_1726206605117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("first_trained_trainer_finetuned_1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("first_trained_trainer_finetuned_1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|first_trained_trainer_finetuned_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tshamardina/first_trained_trainer-finetuned-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-gqa_roberta_german_legal_squad_part_augmented_2000_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-13-gqa_roberta_german_legal_squad_part_augmented_2000_pipeline_de.md new file mode 100644 index 00000000000000..659bbdbb9290cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-gqa_roberta_german_legal_squad_part_augmented_2000_pipeline_de.md @@ -0,0 +1,69 @@ +--- +layout: model +title: German gqa_roberta_german_legal_squad_part_augmented_2000_pipeline pipeline RoBertaForQuestionAnswering from farid1088 +author: John Snow Labs +name: gqa_roberta_german_legal_squad_part_augmented_2000_pipeline +date: 2024-09-13 +tags: [de, open_source, pipeline, onnx] +task: Question Answering +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gqa_roberta_german_legal_squad_part_augmented_2000_pipeline` is a German model originally trained by farid1088. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gqa_roberta_german_legal_squad_part_augmented_2000_pipeline_de_5.5.0_3.0_1726231416887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gqa_roberta_german_legal_squad_part_augmented_2000_pipeline_de_5.5.0_3.0_1726231416887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gqa_roberta_german_legal_squad_part_augmented_2000_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gqa_roberta_german_legal_squad_part_augmented_2000_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gqa_roberta_german_legal_squad_part_augmented_2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|465.8 MB| + +## References + +https://huggingface.co/farid1088/GQA_RoBERTa_German_legal_SQuAD_part_augmented_2000 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-hunembert3_hu.md b/docs/_posts/ahmedlone127/2024-09-13-hunembert3_hu.md new file mode 100644 index 00000000000000..1d680204005874 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-hunembert3_hu.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hungarian hunembert3 BertForSequenceClassification from poltextlab +author: John Snow Labs +name: hunembert3 +date: 2024-09-13 +tags: [hu, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hunembert3` is a Hungarian model originally trained by poltextlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hunembert3_hu_5.5.0_3.0_1726201684288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hunembert3_hu_5.5.0_3.0_1726201684288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hunembert3","hu") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hunembert3", "hu") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hunembert3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|hu| +|Size:|414.7 MB| + +## References + +https://huggingface.co/poltextlab/HunEmBERT3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-imdbreviews_classification_roberta_v01_clf_finetuning_dfelorza_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-imdbreviews_classification_roberta_v01_clf_finetuning_dfelorza_pipeline_en.md new file mode 100644 index 00000000000000..22292975d4b50c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-imdbreviews_classification_roberta_v01_clf_finetuning_dfelorza_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdbreviews_classification_roberta_v01_clf_finetuning_dfelorza_pipeline pipeline RoBertaForSequenceClassification from dfelorza +author: John Snow Labs +name: imdbreviews_classification_roberta_v01_clf_finetuning_dfelorza_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_roberta_v01_clf_finetuning_dfelorza_pipeline` is a English model originally trained by dfelorza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_roberta_v01_clf_finetuning_dfelorza_pipeline_en_5.5.0_3.0_1726227329308.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_roberta_v01_clf_finetuning_dfelorza_pipeline_en_5.5.0_3.0_1726227329308.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdbreviews_classification_roberta_v01_clf_finetuning_dfelorza_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdbreviews_classification_roberta_v01_clf_finetuning_dfelorza_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_roberta_v01_clf_finetuning_dfelorza_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|300.4 MB| + +## References + +https://huggingface.co/dfelorza/imdbreviews_classification_roberta_v01_clf_finetuning + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-isom5240_task1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-isom5240_task1_pipeline_en.md new file mode 100644 index 00000000000000..fd2872beac5129 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-isom5240_task1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English isom5240_task1_pipeline pipeline DistilBertForSequenceClassification from jackjc +author: John Snow Labs +name: isom5240_task1_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`isom5240_task1_pipeline` is a English model originally trained by jackjc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/isom5240_task1_pipeline_en_5.5.0_3.0_1726242991859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/isom5240_task1_pipeline_en_5.5.0_3.0_1726242991859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("isom5240_task1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("isom5240_task1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|isom5240_task1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jackjc/ISOM5240_Task1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-khipu_finetuned_amazon_reviews_multi_andrescastro_itm_en.md b/docs/_posts/ahmedlone127/2024-09-13-khipu_finetuned_amazon_reviews_multi_andrescastro_itm_en.md new file mode 100644 index 00000000000000..2b933a13111078 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-khipu_finetuned_amazon_reviews_multi_andrescastro_itm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English khipu_finetuned_amazon_reviews_multi_andrescastro_itm RoBertaForSequenceClassification from andrescastro-itm +author: John Snow Labs +name: khipu_finetuned_amazon_reviews_multi_andrescastro_itm +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khipu_finetuned_amazon_reviews_multi_andrescastro_itm` is a English model originally trained by andrescastro-itm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khipu_finetuned_amazon_reviews_multi_andrescastro_itm_en_5.5.0_3.0_1726247064927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khipu_finetuned_amazon_reviews_multi_andrescastro_itm_en_5.5.0_3.0_1726247064927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("khipu_finetuned_amazon_reviews_multi_andrescastro_itm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("khipu_finetuned_amazon_reviews_multi_andrescastro_itm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khipu_finetuned_amazon_reviews_multi_andrescastro_itm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|428.7 MB| + +## References + +https://huggingface.co/andrescastro-itm/khipu-finetuned-amazon_reviews_multi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-log_analysis_model_distilbert_teoogherghi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-log_analysis_model_distilbert_teoogherghi_pipeline_en.md new file mode 100644 index 00000000000000..d4923091ef524f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-log_analysis_model_distilbert_teoogherghi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English log_analysis_model_distilbert_teoogherghi_pipeline pipeline DistilBertForSequenceClassification from teoogherghi +author: John Snow Labs +name: log_analysis_model_distilbert_teoogherghi_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`log_analysis_model_distilbert_teoogherghi_pipeline` is a English model originally trained by teoogherghi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/log_analysis_model_distilbert_teoogherghi_pipeline_en_5.5.0_3.0_1726262670569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/log_analysis_model_distilbert_teoogherghi_pipeline_en_5.5.0_3.0_1726262670569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("log_analysis_model_distilbert_teoogherghi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("log_analysis_model_distilbert_teoogherghi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|log_analysis_model_distilbert_teoogherghi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/teoogherghi/Log-Analysis-Model-DistilBert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-marathi_marh_val_j_mr.md b/docs/_posts/ahmedlone127/2024-09-13-marathi_marh_val_j_mr.md new file mode 100644 index 00000000000000..0b7805c4faa0ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-marathi_marh_val_j_mr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Marathi marathi_marh_val_j WhisperForCTC from simran14 +author: John Snow Labs +name: marathi_marh_val_j +date: 2024-09-13 +tags: [mr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_marh_val_j` is a Marathi model originally trained by simran14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_marh_val_j_mr_5.5.0_3.0_1726220393371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_marh_val_j_mr_5.5.0_3.0_1726220393371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("marathi_marh_val_j","mr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("marathi_marh_val_j", "mr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_marh_val_j| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/simran14/mr-val-j \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-marian_finetuned_kde4_english_tonga_tonga_islands_french_abhikr487_en.md b/docs/_posts/ahmedlone127/2024-09-13-marian_finetuned_kde4_english_tonga_tonga_islands_french_abhikr487_en.md new file mode 100644 index 00000000000000..27260b6ffcce54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-marian_finetuned_kde4_english_tonga_tonga_islands_french_abhikr487_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_abhikr487 MarianTransformer from abhikr487 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_abhikr487 +date: 2024-09-13 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_abhikr487` is a English model originally trained by abhikr487. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_abhikr487_en_5.5.0_3.0_1726269414265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_abhikr487_en_5.5.0_3.0_1726269414265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_abhikr487","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_abhikr487","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_abhikr487| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.2 MB| + +## References + +https://huggingface.co/abhikr487/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-marian_finetuned_kde4_english_tonga_tonga_islands_french_seokjin1013_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-marian_finetuned_kde4_english_tonga_tonga_islands_french_seokjin1013_pipeline_en.md new file mode 100644 index 00000000000000..dd0ad5318afcdb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-marian_finetuned_kde4_english_tonga_tonga_islands_french_seokjin1013_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_seokjin1013_pipeline pipeline MarianTransformer from seokjin1013 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_seokjin1013_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_seokjin1013_pipeline` is a English model originally trained by seokjin1013. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_seokjin1013_pipeline_en_5.5.0_3.0_1726192406838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_seokjin1013_pipeline_en_5.5.0_3.0_1726192406838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_seokjin1013_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_seokjin1013_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_seokjin1013_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.7 MB| + +## References + +https://huggingface.co/seokjin1013/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-msc_baseline_marian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-msc_baseline_marian_pipeline_en.md new file mode 100644 index 00000000000000..b2e36b11854938 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-msc_baseline_marian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English msc_baseline_marian_pipeline pipeline MarianTransformer from EricPeter +author: John Snow Labs +name: msc_baseline_marian_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`msc_baseline_marian_pipeline` is a English model originally trained by EricPeter. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/msc_baseline_marian_pipeline_en_5.5.0_3.0_1726192416577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/msc_baseline_marian_pipeline_en_5.5.0_3.0_1726192416577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("msc_baseline_marian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("msc_baseline_marian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|msc_baseline_marian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|530.4 MB| + +## References + +https://huggingface.co/EricPeter/msc-baseline-marian + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-nepal_bhasa_repo_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-nepal_bhasa_repo_last_pipeline_en.md new file mode 100644 index 00000000000000..871efdd15840d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-nepal_bhasa_repo_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_repo_last_pipeline pipeline DistilBertForSequenceClassification from aghiles-s +author: John Snow Labs +name: nepal_bhasa_repo_last_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_repo_last_pipeline` is a English model originally trained by aghiles-s. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_repo_last_pipeline_en_5.5.0_3.0_1726242494766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_repo_last_pipeline_en_5.5.0_3.0_1726242494766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_repo_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_repo_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_repo_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aghiles-s/new_repo_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ericlaycock_en.md b/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ericlaycock_en.md new file mode 100644 index 00000000000000..8b6c450ffad577 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ericlaycock_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ericlaycock MarianTransformer from ericlaycock +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ericlaycock +date: 2024-09-13 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ericlaycock` is a English model originally trained by ericlaycock. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ericlaycock_en_5.5.0_3.0_1726268929973.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ericlaycock_en_5.5.0_3.0_1726268929973.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ericlaycock","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ericlaycock","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ericlaycock| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/ericlaycock/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_russian_english_finetuned_russian_tonga_tonga_islands_english_vsweatherfield_en.md b/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_russian_english_finetuned_russian_tonga_tonga_islands_english_vsweatherfield_en.md new file mode 100644 index 00000000000000..57ab48c13c3bb7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_russian_english_finetuned_russian_tonga_tonga_islands_english_vsweatherfield_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_russian_english_finetuned_russian_tonga_tonga_islands_english_vsweatherfield MarianTransformer from vsweatherfield +author: John Snow Labs +name: opus_maltese_russian_english_finetuned_russian_tonga_tonga_islands_english_vsweatherfield +date: 2024-09-13 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_russian_english_finetuned_russian_tonga_tonga_islands_english_vsweatherfield` is a English model originally trained by vsweatherfield. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_russian_english_finetuned_russian_tonga_tonga_islands_english_vsweatherfield_en_5.5.0_3.0_1726268756447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_russian_english_finetuned_russian_tonga_tonga_islands_english_vsweatherfield_en_5.5.0_3.0_1726268756447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_russian_english_finetuned_russian_tonga_tonga_islands_english_vsweatherfield","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_russian_english_finetuned_russian_tonga_tonga_islands_english_vsweatherfield","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_russian_english_finetuned_russian_tonga_tonga_islands_english_vsweatherfield| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|526.4 MB| + +## References + +https://huggingface.co/vsweatherfield/opus-mt-ru-en-finetuned-ru-to-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_en.md new file mode 100644 index 00000000000000..68843b91570ff9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_turkish_english_finetuned_npomo_english_10_epochs MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_turkish_english_finetuned_npomo_english_10_epochs +date: 2024-09-13 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_turkish_english_finetuned_npomo_english_10_epochs` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726268563937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726268563937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_turkish_english_finetuned_npomo_english_10_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_turkish_english_finetuned_npomo_english_10_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_turkish_english_finetuned_npomo_english_10_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|525.2 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-tr-en-finetuned-npomo-en-10-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_pipeline_en.md new file mode 100644 index 00000000000000..c6382e7726820f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_pipeline pipeline MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_pipeline` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_pipeline_en_5.5.0_3.0_1726268589897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_pipeline_en_5.5.0_3.0_1726268589897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_turkish_english_finetuned_npomo_english_10_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|525.7 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-tr-en-finetuned-npomo-en-10-epochs + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-qa_for_somali_model_en.md b/docs/_posts/ahmedlone127/2024-09-13-qa_for_somali_model_en.md new file mode 100644 index 00000000000000..3074b9eccc5485 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-qa_for_somali_model_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English qa_for_somali_model RoBertaForQuestionAnswering from Raylei231 +author: John Snow Labs +name: qa_for_somali_model +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_for_somali_model` is a English model originally trained by Raylei231. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_for_somali_model_en_5.5.0_3.0_1726207208845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_for_somali_model_en_5.5.0_3.0_1726207208845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("qa_for_somali_model","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("qa_for_somali_model", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_for_somali_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|457.4 MB| + +## References + +https://huggingface.co/Raylei231/QA_for_SO_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-random_english_tonga_tonga_islands_french_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-random_english_tonga_tonga_islands_french_pipeline_en.md new file mode 100644 index 00000000000000..a72eb0fd473421 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-random_english_tonga_tonga_islands_french_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English random_english_tonga_tonga_islands_french_pipeline pipeline MarianTransformer from Jingyi28 +author: John Snow Labs +name: random_english_tonga_tonga_islands_french_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`random_english_tonga_tonga_islands_french_pipeline` is a English model originally trained by Jingyi28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/random_english_tonga_tonga_islands_french_pipeline_en_5.5.0_3.0_1726192255735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/random_english_tonga_tonga_islands_french_pipeline_en_5.5.0_3.0_1726192255735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("random_english_tonga_tonga_islands_french_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("random_english_tonga_tonga_islands_french_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|random_english_tonga_tonga_islands_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|510.3 MB| + +## References + +https://huggingface.co/Jingyi28/random-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_base_epoch_60_en.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_epoch_60_en.md new file mode 100644 index 00000000000000..53a4d57f0b18ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_epoch_60_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_60 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_60 +date: 2024-09-13 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_60` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_60_en_5.5.0_3.0_1726197480823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_60_en_5.5.0_3.0_1726197480823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_60","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_60","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_60| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_60 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_base_epoch_72_en.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_epoch_72_en.md new file mode 100644 index 00000000000000..76f5ca18feea60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_epoch_72_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_72 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_72 +date: 2024-09-13 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_72` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_72_en_5.5.0_3.0_1726197337334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_72_en_5.5.0_3.0_1726197337334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_72","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_72","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_72| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_72 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_base_finetuned_squad_roberta_v2_en.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_finetuned_squad_roberta_v2_en.md new file mode 100644 index 00000000000000..ca8b13566ae725 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_finetuned_squad_roberta_v2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_finetuned_squad_roberta_v2 RoBertaForQuestionAnswering from seviladiguzel +author: John Snow Labs +name: roberta_base_finetuned_squad_roberta_v2 +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_squad_roberta_v2` is a English model originally trained by seviladiguzel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_roberta_v2_en_5.5.0_3.0_1726206817959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_roberta_v2_en_5.5.0_3.0_1726206817959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetuned_squad_roberta_v2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetuned_squad_roberta_v2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_squad_roberta_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.4 MB| + +## References + +https://huggingface.co/seviladiguzel/roberta-base-finetuned-squad_roberta_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_base_finetuned_squad_roberta_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_finetuned_squad_roberta_v2_pipeline_en.md new file mode 100644 index 00000000000000..e0f059bc40695c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_finetuned_squad_roberta_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_base_finetuned_squad_roberta_v2_pipeline pipeline RoBertaForQuestionAnswering from seviladiguzel +author: John Snow Labs +name: roberta_base_finetuned_squad_roberta_v2_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_squad_roberta_v2_pipeline` is a English model originally trained by seviladiguzel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_roberta_v2_pipeline_en_5.5.0_3.0_1726206838942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_roberta_v2_pipeline_en_5.5.0_3.0_1726206838942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_squad_roberta_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_squad_roberta_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_squad_roberta_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.4 MB| + +## References + +https://huggingface.co/seviladiguzel/roberta-base-finetuned-squad_roberta_v2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_base_imdb_saved_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_imdb_saved_pipeline_en.md new file mode 100644 index 00000000000000..bb4889821b39e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_imdb_saved_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_imdb_saved_pipeline pipeline RoBertaForSequenceClassification from thaile +author: John Snow Labs +name: roberta_base_imdb_saved_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_imdb_saved_pipeline` is a English model originally trained by thaile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_imdb_saved_pipeline_en_5.5.0_3.0_1726227927738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_imdb_saved_pipeline_en_5.5.0_3.0_1726227927738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_imdb_saved_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_imdb_saved_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_imdb_saved_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/thaile/roberta-base-imdb-saved + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_base_squad2_p60_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_squad2_p60_pipeline_en.md new file mode 100644 index 00000000000000..92538235b85983 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_squad2_p60_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_base_squad2_p60_pipeline pipeline RoBertaForQuestionAnswering from pminha +author: John Snow Labs +name: roberta_base_squad2_p60_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_squad2_p60_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_squad2_p60_pipeline_en_5.5.0_3.0_1726206993027.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_squad2_p60_pipeline_en_5.5.0_3.0_1726206993027.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_squad2_p60_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_squad2_p60_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_squad2_p60_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.0 MB| + +## References + +https://huggingface.co/pminha/roberta-base-squad2-p60 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_legal_german_cased_german_legal_squad_17_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_legal_german_cased_german_legal_squad_17_pipeline_de.md new file mode 100644 index 00000000000000..5cf49dcaf5ae50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_legal_german_cased_german_legal_squad_17_pipeline_de.md @@ -0,0 +1,69 @@ +--- +layout: model +title: German roberta_legal_german_cased_german_legal_squad_17_pipeline pipeline RoBertaForQuestionAnswering from farid1088 +author: John Snow Labs +name: roberta_legal_german_cased_german_legal_squad_17_pipeline +date: 2024-09-13 +tags: [de, open_source, pipeline, onnx] +task: Question Answering +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_legal_german_cased_german_legal_squad_17_pipeline` is a German model originally trained by farid1088. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_legal_german_cased_german_legal_squad_17_pipeline_de_5.5.0_3.0_1726206702404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_legal_german_cased_german_legal_squad_17_pipeline_de_5.5.0_3.0_1726206702404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_legal_german_cased_german_legal_squad_17_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_legal_german_cased_german_legal_squad_17_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_legal_german_cased_german_legal_squad_17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|465.8 MB| + +## References + +https://huggingface.co/farid1088/RoBERTa-legal-de-cased_German_legal_SQuAD_17 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_bert_base_nli_ct_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_bert_base_nli_ct_en.md new file mode 100644 index 00000000000000..986d16ab1a9599 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_bert_base_nli_ct_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_nli_ct BertSentenceEmbeddings from Contrastive-Tension +author: John Snow Labs +name: sent_bert_base_nli_ct +date: 2024-09-13 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_nli_ct` is a English model originally trained by Contrastive-Tension. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_nli_ct_en_5.5.0_3.0_1726246033143.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_nli_ct_en_5.5.0_3.0_1726246033143.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_nli_ct","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_nli_ct","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_nli_ct| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Contrastive-Tension/BERT-Base-NLI-CT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_bert_medium_arabic_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-13-sent_bert_medium_arabic_pipeline_ar.md new file mode 100644 index 00000000000000..303267a6d6e749 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_bert_medium_arabic_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_bert_medium_arabic_pipeline pipeline BertSentenceEmbeddings from asafaya +author: John Snow Labs +name: sent_bert_medium_arabic_pipeline +date: 2024-09-13 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_medium_arabic_pipeline` is a Arabic model originally trained by asafaya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_medium_arabic_pipeline_ar_5.5.0_3.0_1726202722927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_medium_arabic_pipeline_ar_5.5.0_3.0_1726202722927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_medium_arabic_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_medium_arabic_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_medium_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|157.7 MB| + +## References + +https://huggingface.co/asafaya/bert-medium-arabic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_beto_base_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_beto_base_cased_pipeline_en.md new file mode 100644 index 00000000000000..c6a7b5540229f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_beto_base_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_beto_base_cased_pipeline pipeline BertSentenceEmbeddings from espejelomar +author: John Snow Labs +name: sent_beto_base_cased_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_beto_base_cased_pipeline` is a English model originally trained by espejelomar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_beto_base_cased_pipeline_en_5.5.0_3.0_1726224497046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_beto_base_cased_pipeline_en_5.5.0_3.0_1726224497046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_beto_base_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_beto_base_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_beto_base_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/espejelomar/beto-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_bulbert_chitanka_model_pipeline_bg.md b/docs/_posts/ahmedlone127/2024-09-13-sent_bulbert_chitanka_model_pipeline_bg.md new file mode 100644 index 00000000000000..d67d60cb023836 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_bulbert_chitanka_model_pipeline_bg.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Bulgarian sent_bulbert_chitanka_model_pipeline pipeline BertSentenceEmbeddings from mor40 +author: John Snow Labs +name: sent_bulbert_chitanka_model_pipeline +date: 2024-09-13 +tags: [bg, open_source, pipeline, onnx] +task: Embeddings +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bulbert_chitanka_model_pipeline` is a Bulgarian model originally trained by mor40. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bulbert_chitanka_model_pipeline_bg_5.5.0_3.0_1726224347482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bulbert_chitanka_model_pipeline_bg_5.5.0_3.0_1726224347482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bulbert_chitanka_model_pipeline", lang = "bg") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bulbert_chitanka_model_pipeline", lang = "bg") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bulbert_chitanka_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bg| +|Size:|306.6 MB| + +## References + +https://huggingface.co/mor40/BulBERT-chitanka-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_gbert_biom_translation_large_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_gbert_biom_translation_large_en.md new file mode 100644 index 00000000000000..de8afb2a81ff88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_gbert_biom_translation_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_gbert_biom_translation_large BertSentenceEmbeddings from ikim-uk-essen +author: John Snow Labs +name: sent_gbert_biom_translation_large +date: 2024-09-13 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_gbert_biom_translation_large` is a English model originally trained by ikim-uk-essen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_gbert_biom_translation_large_en_5.5.0_3.0_1726223942016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_gbert_biom_translation_large_en_5.5.0_3.0_1726223942016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_gbert_biom_translation_large","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_gbert_biom_translation_large","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_gbert_biom_translation_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ikim-uk-essen/GBERT-BioM-Translation-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_indo_legalbert_id.md b/docs/_posts/ahmedlone127/2024-09-13-sent_indo_legalbert_id.md new file mode 100644 index 00000000000000..6cf923b9bafaec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_indo_legalbert_id.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Indonesian sent_indo_legalbert BertSentenceEmbeddings from archi-ai +author: John Snow Labs +name: sent_indo_legalbert +date: 2024-09-13 +tags: [id, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_indo_legalbert` is a Indonesian model originally trained by archi-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_indo_legalbert_id_5.5.0_3.0_1726230093671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_indo_legalbert_id_5.5.0_3.0_1726230093671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_indo_legalbert","id") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_indo_legalbert","id") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_indo_legalbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|id| +|Size:|1.3 GB| + +## References + +https://huggingface.co/archi-ai/Indo-LegalBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_jobgbert_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-13-sent_jobgbert_pipeline_de.md new file mode 100644 index 00000000000000..1a853ac4e4e054 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_jobgbert_pipeline_de.md @@ -0,0 +1,71 @@ +--- +layout: model +title: German sent_jobgbert_pipeline pipeline BertSentenceEmbeddings from agne +author: John Snow Labs +name: sent_jobgbert_pipeline +date: 2024-09-13 +tags: [de, open_source, pipeline, onnx] +task: Embeddings +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_jobgbert_pipeline` is a German model originally trained by agne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_jobgbert_pipeline_de_5.5.0_3.0_1726229925560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_jobgbert_pipeline_de_5.5.0_3.0_1726229925560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_jobgbert_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_jobgbert_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_jobgbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|410.1 MB| + +## References + +https://huggingface.co/agne/jobGBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_varta_bert_xx.md b/docs/_posts/ahmedlone127/2024-09-13-sent_varta_bert_xx.md new file mode 100644 index 00000000000000..a06e19357e9587 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_varta_bert_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_varta_bert BertSentenceEmbeddings from rahular +author: John Snow Labs +name: sent_varta_bert +date: 2024-09-13 +tags: [xx, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_varta_bert` is a Multilingual model originally trained by rahular. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_varta_bert_xx_5.5.0_3.0_1726224153836.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_varta_bert_xx_5.5.0_3.0_1726224153836.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_varta_bert","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_varta_bert","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_varta_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|691.1 MB| + +## References + +https://huggingface.co/rahular/varta-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sms_spam_model_v1_2_en.md b/docs/_posts/ahmedlone127/2024-09-13-sms_spam_model_v1_2_en.md new file mode 100644 index 00000000000000..30b7a25f438450 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sms_spam_model_v1_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sms_spam_model_v1_2 DistilBertForSequenceClassification from xia0t1an +author: John Snow Labs +name: sms_spam_model_v1_2 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sms_spam_model_v1_2` is a English model originally trained by xia0t1an. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sms_spam_model_v1_2_en_5.5.0_3.0_1726242374333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sms_spam_model_v1_2_en_5.5.0_3.0_1726242374333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sms_spam_model_v1_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sms_spam_model_v1_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sms_spam_model_v1_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|200.3 MB| + +## References + +https://huggingface.co/xia0t1an/sms-spam-model-v1_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-splade_pp_english_v2_en.md b/docs/_posts/ahmedlone127/2024-09-13-splade_pp_english_v2_en.md new file mode 100644 index 00000000000000..ea53516a5a4335 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-splade_pp_english_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English splade_pp_english_v2 BertEmbeddings from prithivida +author: John Snow Labs +name: splade_pp_english_v2 +date: 2024-09-13 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`splade_pp_english_v2` is a English model originally trained by prithivida. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/splade_pp_english_v2_en_5.5.0_3.0_1726229077304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/splade_pp_english_v2_en_5.5.0_3.0_1726229077304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("splade_pp_english_v2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("splade_pp_english_v2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|splade_pp_english_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/prithivida/Splade_PP_en_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-tamil_codemixed_abusive_muril_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-tamil_codemixed_abusive_muril_pipeline_en.md new file mode 100644 index 00000000000000..2d586b826b89f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-tamil_codemixed_abusive_muril_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tamil_codemixed_abusive_muril_pipeline pipeline BertForSequenceClassification from Hate-speech-CNERG +author: John Snow Labs +name: tamil_codemixed_abusive_muril_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tamil_codemixed_abusive_muril_pipeline` is a English model originally trained by Hate-speech-CNERG. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tamil_codemixed_abusive_muril_pipeline_en_5.5.0_3.0_1726201341747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tamil_codemixed_abusive_muril_pipeline_en_5.5.0_3.0_1726201341747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tamil_codemixed_abusive_muril_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tamil_codemixed_abusive_muril_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tamil_codemixed_abusive_muril_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|892.7 MB| + +## References + +https://huggingface.co/Hate-speech-CNERG/tamil-codemixed-abusive-MuRIL + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-test_seven_ko.md b/docs/_posts/ahmedlone127/2024-09-13-test_seven_ko.md new file mode 100644 index 00000000000000..6ea4d0f38c9980 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-test_seven_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean test_seven WhisperForCTC from kyungmin011029 +author: John Snow Labs +name: test_seven +date: 2024-09-13 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_seven` is a Korean model originally trained by kyungmin011029. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_seven_ko_5.5.0_3.0_1726217383045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_seven_ko_5.5.0_3.0_1726217383045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("test_seven","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("test_seven", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_seven| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/kyungmin011029/test_seven \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-text_classification_rachelle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-text_classification_rachelle_pipeline_en.md new file mode 100644 index 00000000000000..ad15ef6109287d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-text_classification_rachelle_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_classification_rachelle_pipeline pipeline DistilBertForSequenceClassification from RachelLe +author: John Snow Labs +name: text_classification_rachelle_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_rachelle_pipeline` is a English model originally trained by RachelLe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_rachelle_pipeline_en_5.5.0_3.0_1726262770339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_rachelle_pipeline_en_5.5.0_3.0_1726262770339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_classification_rachelle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_classification_rachelle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_rachelle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RachelLe/text_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-tiny_bert_0102_4000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-tiny_bert_0102_4000_pipeline_en.md new file mode 100644 index 00000000000000..b35cb88f19fc2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-tiny_bert_0102_4000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tiny_bert_0102_4000_pipeline pipeline AlbertForSequenceClassification from gg-ai +author: John Snow Labs +name: tiny_bert_0102_4000_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_bert_0102_4000_pipeline` is a English model originally trained by gg-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_bert_0102_4000_pipeline_en_5.5.0_3.0_1726225701285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_bert_0102_4000_pipeline_en_5.5.0_3.0_1726225701285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_bert_0102_4000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_bert_0102_4000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_bert_0102_4000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|20.5 MB| + +## References + +https://huggingface.co/gg-ai/tiny-bert-0102-4000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-tiny_english_emsassist_2_25_0_15_16_1e_05_sandy_sweep_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-tiny_english_emsassist_2_25_0_15_16_1e_05_sandy_sweep_4_pipeline_en.md new file mode 100644 index 00000000000000..b18255b4b05ed8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-tiny_english_emsassist_2_25_0_15_16_1e_05_sandy_sweep_4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tiny_english_emsassist_2_25_0_15_16_1e_05_sandy_sweep_4_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_emsassist_2_25_0_15_16_1e_05_sandy_sweep_4_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_emsassist_2_25_0_15_16_1e_05_sandy_sweep_4_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_emsassist_2_25_0_15_16_1e_05_sandy_sweep_4_pipeline_en_5.5.0_3.0_1726254010467.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_emsassist_2_25_0_15_16_1e_05_sandy_sweep_4_pipeline_en_5.5.0_3.0_1726254010467.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_english_emsassist_2_25_0_15_16_1e_05_sandy_sweep_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_english_emsassist_2_25_0_15_16_1e_05_sandy_sweep_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_emsassist_2_25_0_15_16_1e_05_sandy_sweep_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.7 MB| + +## References + +https://huggingface.co/saahith/tiny.en-EMSAssist-2-25-0.15-16-1e-05-sandy-sweep-4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-toxic_comment_model_ethos_ft_en.md b/docs/_posts/ahmedlone127/2024-09-13-toxic_comment_model_ethos_ft_en.md new file mode 100644 index 00000000000000..efa1124f6eabe1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-toxic_comment_model_ethos_ft_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English toxic_comment_model_ethos_ft DistilBertForSequenceClassification from fatmhd1995 +author: John Snow Labs +name: toxic_comment_model_ethos_ft +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxic_comment_model_ethos_ft` is a English model originally trained by fatmhd1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxic_comment_model_ethos_ft_en_5.5.0_3.0_1726242262688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxic_comment_model_ethos_ft_en_5.5.0_3.0_1726242262688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("toxic_comment_model_ethos_ft","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("toxic_comment_model_ethos_ft", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxic_comment_model_ethos_ft| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fatmhd1995/toxic_comment_model_ethos_ft \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-tweets_text_generation_sentiment_analysis_uploaded0519_en.md b/docs/_posts/ahmedlone127/2024-09-13-tweets_text_generation_sentiment_analysis_uploaded0519_en.md new file mode 100644 index 00000000000000..bdf2edee52bd8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-tweets_text_generation_sentiment_analysis_uploaded0519_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tweets_text_generation_sentiment_analysis_uploaded0519 RoBertaForSequenceClassification from WooHoo86 +author: John Snow Labs +name: tweets_text_generation_sentiment_analysis_uploaded0519 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tweets_text_generation_sentiment_analysis_uploaded0519` is a English model originally trained by WooHoo86. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tweets_text_generation_sentiment_analysis_uploaded0519_en_5.5.0_3.0_1726187808763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tweets_text_generation_sentiment_analysis_uploaded0519_en_5.5.0_3.0_1726187808763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("tweets_text_generation_sentiment_analysis_uploaded0519","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("tweets_text_generation_sentiment_analysis_uploaded0519", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tweets_text_generation_sentiment_analysis_uploaded0519| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/WooHoo86/tweets-text-generation-sentiment-analysis-uploaded0519 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-twitter_roberta_base_dec2021_tweet_topic_multi_2020_en.md b/docs/_posts/ahmedlone127/2024-09-13-twitter_roberta_base_dec2021_tweet_topic_multi_2020_en.md new file mode 100644 index 00000000000000..fb7cc0316022ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-twitter_roberta_base_dec2021_tweet_topic_multi_2020_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_roberta_base_dec2021_tweet_topic_multi_2020 RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_dec2021_tweet_topic_multi_2020 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_dec2021_tweet_topic_multi_2020` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_dec2021_tweet_topic_multi_2020_en_5.5.0_3.0_1726187706546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_dec2021_tweet_topic_multi_2020_en_5.5.0_3.0_1726187706546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_dec2021_tweet_topic_multi_2020","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_dec2021_tweet_topic_multi_2020", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_dec2021_tweet_topic_multi_2020| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base-dec2021-tweet-topic-multi-2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-whisper_small_ewe_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_ewe_pipeline_en.md new file mode 100644 index 00000000000000..e740bd6930455a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_ewe_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_ewe_pipeline pipeline WhisperForCTC from abiyo27 +author: John Snow Labs +name: whisper_small_ewe_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ewe_pipeline` is a English model originally trained by abiyo27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ewe_pipeline_en_5.5.0_3.0_1726250434439.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ewe_pipeline_en_5.5.0_3.0_1726250434439.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_ewe_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_ewe_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ewe_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/abiyo27/whisper-small-ewe + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-whisper_small_ft_001_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_ft_001_pipeline_hi.md new file mode 100644 index 00000000000000..424677e1f0b7f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_ft_001_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_ft_001_pipeline pipeline WhisperForCTC from shinsu +author: John Snow Labs +name: whisper_small_ft_001_pipeline +date: 2024-09-13 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ft_001_pipeline` is a Hindi model originally trained by shinsu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ft_001_pipeline_hi_5.5.0_3.0_1726221839347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ft_001_pipeline_hi_5.5.0_3.0_1726221839347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_ft_001_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_ft_001_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ft_001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/shinsu/whisper-small-ft-001 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-whisper_small_voice_conversion_korean_10min_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_voice_conversion_korean_10min_pipeline_en.md new file mode 100644 index 00000000000000..ce0fd4178d184a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_voice_conversion_korean_10min_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_voice_conversion_korean_10min_pipeline pipeline WhisperForCTC from ghost613 +author: John Snow Labs +name: whisper_small_voice_conversion_korean_10min_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_voice_conversion_korean_10min_pipeline` is a English model originally trained by ghost613. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_voice_conversion_korean_10min_pipeline_en_5.5.0_3.0_1726257042476.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_voice_conversion_korean_10min_pipeline_en_5.5.0_3.0_1726257042476.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_voice_conversion_korean_10min_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_voice_conversion_korean_10min_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_voice_conversion_korean_10min_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ghost613/whisper-small-voice-conversion-korean-10min + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-withinapps_ndd_claroline_test_tags_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-withinapps_ndd_claroline_test_tags_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..b0968c571e3c93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-withinapps_ndd_claroline_test_tags_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_claroline_test_tags_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_claroline_test_tags_cwadj_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_claroline_test_tags_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_claroline_test_tags_cwadj_pipeline_en_5.5.0_3.0_1726262878733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_claroline_test_tags_cwadj_pipeline_en_5.5.0_3.0_1726262878733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_claroline_test_tags_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_claroline_test_tags_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_claroline_test_tags_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-claroline_test-tags-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_r_galen_ctebmsp_es.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_r_galen_ctebmsp_es.md new file mode 100644 index 00000000000000..59c1795930b9c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_r_galen_ctebmsp_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_ctebmsp XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_ctebmsp +date: 2024-09-13 +tags: [es, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_ctebmsp` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_ctebmsp_es_5.5.0_3.0_1726211613163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_ctebmsp_es_5.5.0_3.0_1726211613163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_ctebmsp","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_ctebmsp", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_ctebmsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-ctebmsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_final_mixed_aug_insert_w2v_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_final_mixed_aug_insert_w2v_pipeline_en.md new file mode 100644 index 00000000000000..bac0212b3caa05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_final_mixed_aug_insert_w2v_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_aug_insert_w2v_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_aug_insert_w2v_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_aug_insert_w2v_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_insert_w2v_pipeline_en_5.5.0_3.0_1726259208383.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_insert_w2v_pipeline_en_5.5.0_3.0_1726259208383.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_final_mixed_aug_insert_w2v_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_final_mixed_aug_insert_w2v_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_aug_insert_w2v_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|796.5 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-aug_insert_w2v + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_french_monkdalma_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_french_monkdalma_pipeline_en.md new file mode 100644 index 00000000000000..6b557336920a43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_french_monkdalma_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_monkdalma_pipeline pipeline XlmRoBertaForTokenClassification from MonkDalma +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_monkdalma_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_monkdalma_pipeline` is a English model originally trained by MonkDalma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_monkdalma_pipeline_en_5.5.0_3.0_1726215698897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_monkdalma_pipeline_en_5.5.0_3.0_1726215698897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_monkdalma_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_monkdalma_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_monkdalma_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/MonkDalma/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_french_vonewman_en.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_french_vonewman_en.md new file mode 100644 index 00000000000000..cc3d1725e03fd8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_french_vonewman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_vonewman XlmRoBertaForTokenClassification from vonewman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_vonewman +date: 2024-09-13 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_vonewman` is a English model originally trained by vonewman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_vonewman_en_5.5.0_3.0_1726215490144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_vonewman_en_5.5.0_3.0_1726215490144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_vonewman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_vonewman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_vonewman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|823.9 MB| + +## References + +https://huggingface.co/vonewman/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_german_french_aaa01101312_en.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_german_french_aaa01101312_en.md new file mode 100644 index 00000000000000..1b197c3b3f2d41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_german_french_aaa01101312_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_aaa01101312 XlmRoBertaForTokenClassification from AAA01101312 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_aaa01101312 +date: 2024-09-13 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_aaa01101312` is a English model originally trained by AAA01101312. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_aaa01101312_en_5.5.0_3.0_1726216303953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_aaa01101312_en_5.5.0_3.0_1726216303953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_aaa01101312","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_aaa01101312", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_aaa01101312| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/AAA01101312/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_german_french_jgriffi_en.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_german_french_jgriffi_en.md new file mode 100644 index 00000000000000..ff1756b3f36d35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_german_french_jgriffi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_jgriffi XlmRoBertaForTokenClassification from jgriffi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_jgriffi +date: 2024-09-13 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_jgriffi` is a English model originally trained by jgriffi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jgriffi_en_5.5.0_3.0_1726238882397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jgriffi_en_5.5.0_3.0_1726238882397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_jgriffi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_jgriffi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_jgriffi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/jgriffi/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_german_french_jgriffi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_german_french_jgriffi_pipeline_en.md new file mode 100644 index 00000000000000..11a85bce706977 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_german_french_jgriffi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_jgriffi_pipeline pipeline XlmRoBertaForTokenClassification from jgriffi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_jgriffi_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_jgriffi_pipeline` is a English model originally trained by jgriffi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jgriffi_pipeline_en_5.5.0_3.0_1726238943918.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jgriffi_pipeline_en_5.5.0_3.0_1726238943918.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_jgriffi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_jgriffi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_jgriffi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/jgriffi/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_italian_lee_soha_en.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_italian_lee_soha_en.md new file mode 100644 index 00000000000000..9ce29b83a2eff4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_italian_lee_soha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_lee_soha XlmRoBertaForTokenClassification from Lee-soha +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_lee_soha +date: 2024-09-13 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_lee_soha` is a English model originally trained by Lee-soha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_lee_soha_en_5.5.0_3.0_1726239169683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_lee_soha_en_5.5.0_3.0_1726239169683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_lee_soha","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_lee_soha", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_lee_soha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/Lee-soha/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-adhd_en.md b/docs/_posts/ahmedlone127/2024-09-14-adhd_en.md new file mode 100644 index 00000000000000..c0c54813714515 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-adhd_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English adhd WhisperForCTC from manal123 +author: John Snow Labs +name: adhd +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adhd` is a English model originally trained by manal123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adhd_en_5.5.0_3.0_1726329597237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adhd_en_5.5.0_3.0_1726329597237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("adhd","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("adhd", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adhd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/manal123/ADHD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-albert_base_v2_wnli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-albert_base_v2_wnli_pipeline_en.md new file mode 100644 index 00000000000000..c9b81136550d3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-albert_base_v2_wnli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_base_v2_wnli_pipeline pipeline AlbertForSequenceClassification from textattack +author: John Snow Labs +name: albert_base_v2_wnli_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_v2_wnli_pipeline` is a English model originally trained by textattack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_v2_wnli_pipeline_en_5.5.0_3.0_1726309455768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_v2_wnli_pipeline_en_5.5.0_3.0_1726309455768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_base_v2_wnli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_base_v2_wnli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_v2_wnli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/textattack/albert-base-v2-WNLI + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-arabert_large_algerian_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-14-arabert_large_algerian_pipeline_ar.md new file mode 100644 index 00000000000000..e27dcbdf256438 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-arabert_large_algerian_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic arabert_large_algerian_pipeline pipeline BertForSequenceClassification from Abdou +author: John Snow Labs +name: arabert_large_algerian_pipeline +date: 2024-09-14 +tags: [ar, open_source, pipeline, onnx] +task: Text Classification +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabert_large_algerian_pipeline` is a Arabic model originally trained by Abdou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabert_large_algerian_pipeline_ar_5.5.0_3.0_1726348516763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabert_large_algerian_pipeline_ar_5.5.0_3.0_1726348516763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arabert_large_algerian_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arabert_large_algerian_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabert_large_algerian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Abdou/arabert-large-algerian + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-arwiki_20230101_roberta_mlm_bots_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-14-arwiki_20230101_roberta_mlm_bots_pipeline_ar.md new file mode 100644 index 00000000000000..f469d3150138d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-arwiki_20230101_roberta_mlm_bots_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic arwiki_20230101_roberta_mlm_bots_pipeline pipeline RoBertaEmbeddings from SaiedAlshahrani +author: John Snow Labs +name: arwiki_20230101_roberta_mlm_bots_pipeline +date: 2024-09-14 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arwiki_20230101_roberta_mlm_bots_pipeline` is a Arabic model originally trained by SaiedAlshahrani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arwiki_20230101_roberta_mlm_bots_pipeline_ar_5.5.0_3.0_1726338723341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arwiki_20230101_roberta_mlm_bots_pipeline_ar_5.5.0_3.0_1726338723341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arwiki_20230101_roberta_mlm_bots_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arwiki_20230101_roberta_mlm_bots_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arwiki_20230101_roberta_mlm_bots_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|311.7 MB| + +## References + +https://huggingface.co/SaiedAlshahrani/arwiki_20230101_roberta_mlm_bots + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-banglabert_sentiment_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-14-banglabert_sentiment_pipeline_bn.md new file mode 100644 index 00000000000000..c3bb52cd0f5a6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-banglabert_sentiment_pipeline_bn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Bengali banglabert_sentiment_pipeline pipeline BertForSequenceClassification from ka05ar +author: John Snow Labs +name: banglabert_sentiment_pipeline +date: 2024-09-14 +tags: [bn, open_source, pipeline, onnx] +task: Text Classification +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`banglabert_sentiment_pipeline` is a Bengali model originally trained by ka05ar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/banglabert_sentiment_pipeline_bn_5.5.0_3.0_1726348021703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/banglabert_sentiment_pipeline_bn_5.5.0_3.0_1726348021703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("banglabert_sentiment_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("banglabert_sentiment_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|banglabert_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|414.5 MB| + +## References + +https://huggingface.co/ka05ar/banglabert-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bert_base_german_cased_fluency_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-14-bert_base_german_cased_fluency_pipeline_de.md new file mode 100644 index 00000000000000..f32b9bdb496f47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bert_base_german_cased_fluency_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German bert_base_german_cased_fluency_pipeline pipeline BertForSequenceClassification from EIStakovskii +author: John Snow Labs +name: bert_base_german_cased_fluency_pipeline +date: 2024-09-14 +tags: [de, open_source, pipeline, onnx] +task: Text Classification +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_german_cased_fluency_pipeline` is a German model originally trained by EIStakovskii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_fluency_pipeline_de_5.5.0_3.0_1726347874664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_fluency_pipeline_de_5.5.0_3.0_1726347874664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_german_cased_fluency_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_german_cased_fluency_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_german_cased_fluency_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|409.1 MB| + +## References + +https://huggingface.co/EIStakovskii/bert-base-german-cased_fluency + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bert_base_multilingual_cased_sentiment_multilingual_xx.md b/docs/_posts/ahmedlone127/2024-09-14-bert_base_multilingual_cased_sentiment_multilingual_xx.md new file mode 100644 index 00000000000000..46002e3aa9e485 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bert_base_multilingual_cased_sentiment_multilingual_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_sentiment_multilingual BertForSequenceClassification from cardiffnlp +author: John Snow Labs +name: bert_base_multilingual_cased_sentiment_multilingual +date: 2024-09-14 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_sentiment_multilingual` is a Multilingual model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_sentiment_multilingual_xx_5.5.0_3.0_1726348126772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_sentiment_multilingual_xx_5.5.0_3.0_1726348126772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_cased_sentiment_multilingual","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_cased_sentiment_multilingual", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_sentiment_multilingual| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|667.3 MB| + +## References + +https://huggingface.co/cardiffnlp/bert-base-multilingual-cased-sentiment-multilingual \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bert_base_uncased_ep_4_87_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-bert_base_uncased_ep_4_87_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..ec9532ae2a1eba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bert_base_uncased_ep_4_87_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_4_87_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_4_87_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_4_87_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_4_87_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1726327111093.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_4_87_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1726327111093.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_4_87_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_4_87_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_4_87_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-4.87-b-32-lr-4e-06-dp-0.1-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bert_base_uncased_finetuned_squad_v2_lauraparra28_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-bert_base_uncased_finetuned_squad_v2_lauraparra28_pipeline_en.md new file mode 100644 index 00000000000000..39fac8c471ee71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bert_base_uncased_finetuned_squad_v2_lauraparra28_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_squad_v2_lauraparra28_pipeline pipeline BertForQuestionAnswering from lauraparra28 +author: John Snow Labs +name: bert_base_uncased_finetuned_squad_v2_lauraparra28_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_squad_v2_lauraparra28_pipeline` is a English model originally trained by lauraparra28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_v2_lauraparra28_pipeline_en_5.5.0_3.0_1726350048794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_v2_lauraparra28_pipeline_en_5.5.0_3.0_1726350048794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_squad_v2_lauraparra28_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_squad_v2_lauraparra28_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_squad_v2_lauraparra28_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/lauraparra28/bert-base-uncased-finetuned-squad_v2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bert_finetuned_ner_alban12_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-bert_finetuned_ner_alban12_pipeline_en.md new file mode 100644 index 00000000000000..0b4cc3affe4fcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bert_finetuned_ner_alban12_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_alban12_pipeline pipeline BertForTokenClassification from alban12 +author: John Snow Labs +name: bert_finetuned_ner_alban12_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_alban12_pipeline` is a English model originally trained by alban12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_alban12_pipeline_en_5.5.0_3.0_1726305386837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_alban12_pipeline_en_5.5.0_3.0_1726305386837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_alban12_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_alban12_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_alban12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.6 MB| + +## References + +https://huggingface.co/alban12/bert-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-brwac_v1_3__checkpoint8_en.md b/docs/_posts/ahmedlone127/2024-09-14-brwac_v1_3__checkpoint8_en.md new file mode 100644 index 00000000000000..109d6983e65d8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-brwac_v1_3__checkpoint8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English brwac_v1_3__checkpoint8 RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_3__checkpoint8 +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_3__checkpoint8` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint8_en_5.5.0_3.0_1726334406897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint8_en_5.5.0_3.0_1726334406897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("brwac_v1_3__checkpoint8","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("brwac_v1_3__checkpoint8","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_3__checkpoint8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|298.6 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_3__checkpoint8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_model_seq_en.md b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_model_seq_en.md new file mode 100644 index 00000000000000..761485d4fd6d71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_model_seq_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_seq AlbertForSequenceClassification from mkim-MASI +author: John Snow Labs +name: burmese_awesome_model_seq +date: 2024-09-14 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_seq` is a English model originally trained by mkim-MASI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_seq_en_5.5.0_3.0_1726309336759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_seq_en_5.5.0_3.0_1726309336759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("burmese_awesome_model_seq","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("burmese_awesome_model_seq", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_seq| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/mkim-MASI/my_awesome_model_seq \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_ai_bled_en.md b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_ai_bled_en.md new file mode 100644 index 00000000000000..665bc1df6f0e42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_ai_bled_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_ai_bled DistilBertForQuestionAnswering from ai-bled +author: John Snow Labs +name: burmese_awesome_qa_model_ai_bled +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_ai_bled` is a English model originally trained by ai-bled. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ai_bled_en_5.5.0_3.0_1726335930555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ai_bled_en_5.5.0_3.0_1726335930555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_ai_bled","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_ai_bled", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_ai_bled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/ai-bled/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_sazara_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_sazara_pipeline_en.md new file mode 100644 index 00000000000000..4a9fa8f1702750 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_sazara_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_sazara_pipeline pipeline DistilBertForQuestionAnswering from sazara +author: John Snow Labs +name: burmese_awesome_qa_model_sazara_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_sazara_pipeline` is a English model originally trained by sazara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_sazara_pipeline_en_5.5.0_3.0_1726335553323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_sazara_pipeline_en_5.5.0_3.0_1726335553323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_sazara_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_sazara_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_sazara_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/sazara/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-burmese_fine_tuned_qa_model_en.md b/docs/_posts/ahmedlone127/2024-09-14-burmese_fine_tuned_qa_model_en.md new file mode 100644 index 00000000000000..a90f29b3b197ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-burmese_fine_tuned_qa_model_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_fine_tuned_qa_model RoBertaForQuestionAnswering from Jahanzaibrao +author: John Snow Labs +name: burmese_fine_tuned_qa_model +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_fine_tuned_qa_model` is a English model originally trained by Jahanzaibrao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_qa_model_en_5.5.0_3.0_1726342658902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_qa_model_en_5.5.0_3.0_1726342658902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("burmese_fine_tuned_qa_model","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("burmese_fine_tuned_qa_model", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_fine_tuned_qa_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.5 MB| + +## References + +https://huggingface.co/Jahanzaibrao/my_fine_tuned_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-clr_finetuned_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-14-clr_finetuned_roberta_large_en.md new file mode 100644 index 00000000000000..0eb793ad2f7904 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-clr_finetuned_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English clr_finetuned_roberta_large RoBertaEmbeddings from SauravMaheshkar +author: John Snow Labs +name: clr_finetuned_roberta_large +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clr_finetuned_roberta_large` is a English model originally trained by SauravMaheshkar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clr_finetuned_roberta_large_en_5.5.0_3.0_1726300381050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clr_finetuned_roberta_large_en_5.5.0_3.0_1726300381050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("clr_finetuned_roberta_large","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("clr_finetuned_roberta_large","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clr_finetuned_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/SauravMaheshkar/clr-finetuned-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-cnec_2_0_czert_b_base_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-cnec_2_0_czert_b_base_cased_pipeline_en.md new file mode 100644 index 00000000000000..6b52b6be2aa533 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-cnec_2_0_czert_b_base_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cnec_2_0_czert_b_base_cased_pipeline pipeline BertForTokenClassification from stulcrad +author: John Snow Labs +name: cnec_2_0_czert_b_base_cased_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cnec_2_0_czert_b_base_cased_pipeline` is a English model originally trained by stulcrad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cnec_2_0_czert_b_base_cased_pipeline_en_5.5.0_3.0_1726306127760.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cnec_2_0_czert_b_base_cased_pipeline_en_5.5.0_3.0_1726306127760.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cnec_2_0_czert_b_base_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cnec_2_0_czert_b_base_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cnec_2_0_czert_b_base_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.6 MB| + +## References + +https://huggingface.co/stulcrad/CNEC_2_0_Czert-B-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-dataequity_opus_maltese_arabic_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-dataequity_opus_maltese_arabic_spanish_pipeline_en.md new file mode 100644 index 00000000000000..1cbb95cbd0a8ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-dataequity_opus_maltese_arabic_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dataequity_opus_maltese_arabic_spanish_pipeline pipeline MarianTransformer from dataequity +author: John Snow Labs +name: dataequity_opus_maltese_arabic_spanish_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dataequity_opus_maltese_arabic_spanish_pipeline` is a English model originally trained by dataequity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dataequity_opus_maltese_arabic_spanish_pipeline_en_5.5.0_3.0_1726350581018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dataequity_opus_maltese_arabic_spanish_pipeline_en_5.5.0_3.0_1726350581018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dataequity_opus_maltese_arabic_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dataequity_opus_maltese_arabic_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dataequity_opus_maltese_arabic_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|527.3 MB| + +## References + +https://huggingface.co/dataequity/dataequity-opus-mt-ar-es + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-distilbert_base_uncased_finetuned_squad_jkhsong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-distilbert_base_uncased_finetuned_squad_jkhsong_pipeline_en.md new file mode 100644 index 00000000000000..f0551926aec7b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-distilbert_base_uncased_finetuned_squad_jkhsong_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_jkhsong_pipeline pipeline DistilBertForQuestionAnswering from jkhsong +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_jkhsong_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_jkhsong_pipeline` is a English model originally trained by jkhsong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_jkhsong_pipeline_en_5.5.0_3.0_1726335548123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_jkhsong_pipeline_en_5.5.0_3.0_1726335548123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_jkhsong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_jkhsong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_jkhsong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/jkhsong/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-distilbert_base_uncased_tokenclassification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-distilbert_base_uncased_tokenclassification_pipeline_en.md new file mode 100644 index 00000000000000..daa0822fb4d479 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-distilbert_base_uncased_tokenclassification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_tokenclassification_pipeline pipeline RoBertaForTokenClassification from urisoo +author: John Snow Labs +name: distilbert_base_uncased_tokenclassification_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_tokenclassification_pipeline` is a English model originally trained by urisoo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tokenclassification_pipeline_en_5.5.0_3.0_1726301382626.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tokenclassification_pipeline_en_5.5.0_3.0_1726301382626.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_tokenclassification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_tokenclassification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_tokenclassification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/urisoo/distilbert-base-uncased-tokenclassification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-distilbert_qa_bizlin_distil_model_en.md b/docs/_posts/ahmedlone127/2024-09-14-distilbert_qa_bizlin_distil_model_en.md new file mode 100644 index 00000000000000..eb2b06d282fb6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-distilbert_qa_bizlin_distil_model_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForQuestionAnswering model (from pakupoko) +author: John Snow Labs +name: distilbert_qa_bizlin_distil_model +date: 2024-09-14 +tags: [en, open_source, distilbert, question_answering, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Question Answering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bizlin-distil-model` is a English model originally trained by `pakupoko`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_qa_bizlin_distil_model_en_5.5.0_3.0_1726335626921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_qa_bizlin_distil_model_en_5.5.0_3.0_1726335626921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = MultiDocumentAssembler() \ +.setInputCols(["question", "context"]) \ +.setOutputCols(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_qa_bizlin_distil_model","en") \ +.setInputCols(["document_question", "document_context"]) \ +.setOutputCol("answer")\ +.setCaseSensitive(True) + +pipeline = Pipeline(stages=[documentAssembler, spanClassifier]) + +data = spark.createDataFrame([["What is my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new MultiDocumentAssembler() +.setInputCols(Array("question", "context")) +.setOutputCols(Array("document_question", "document_context")) + +val spanClassifer = DistilBertForQuestionAnswering.pretrained("distilbert_qa_bizlin_distil_model","en") +.setInputCols(Array("document", "token")) +.setOutputCol("answer") +.setCaseSensitive(true) + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) + +val data = Seq("What is my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.answer_question.distil_bert.by_pakupoko").predict("""What is my name?|||"My name is Clara and I live in Berkeley.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_qa_bizlin_distil_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|104.3 MB| + +## References + +References + +- https://huggingface.co/pakupoko/bizlin-distil-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-distilroberta_base_mrpc_glue_gilbertoesp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-distilroberta_base_mrpc_glue_gilbertoesp_pipeline_en.md new file mode 100644 index 00000000000000..f073c8e1134f44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-distilroberta_base_mrpc_glue_gilbertoesp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_mrpc_glue_gilbertoesp_pipeline pipeline RoBertaForSequenceClassification from gilbertoesp +author: John Snow Labs +name: distilroberta_base_mrpc_glue_gilbertoesp_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_mrpc_glue_gilbertoesp_pipeline` is a English model originally trained by gilbertoesp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_mrpc_glue_gilbertoesp_pipeline_en_5.5.0_3.0_1726272425013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_mrpc_glue_gilbertoesp_pipeline_en_5.5.0_3.0_1726272425013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_mrpc_glue_gilbertoesp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_mrpc_glue_gilbertoesp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_mrpc_glue_gilbertoesp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/gilbertoesp/distilroberta-base-mrpc-glue + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-efficient_mlm_m0_20_en.md b/docs/_posts/ahmedlone127/2024-09-14-efficient_mlm_m0_20_en.md new file mode 100644 index 00000000000000..e1ea6993f046e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-efficient_mlm_m0_20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English efficient_mlm_m0_20 RoBertaEmbeddings from princeton-nlp +author: John Snow Labs +name: efficient_mlm_m0_20 +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`efficient_mlm_m0_20` is a English model originally trained by princeton-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_20_en_5.5.0_3.0_1726334639324.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_20_en_5.5.0_3.0_1726334639324.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("efficient_mlm_m0_20","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("efficient_mlm_m0_20","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|efficient_mlm_m0_20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|842.7 MB| + +## References + +https://huggingface.co/princeton-nlp/efficient_mlm_m0.20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-efficient_mlm_m0_20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-efficient_mlm_m0_20_pipeline_en.md new file mode 100644 index 00000000000000..404a40e3338f83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-efficient_mlm_m0_20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English efficient_mlm_m0_20_pipeline pipeline RoBertaEmbeddings from princeton-nlp +author: John Snow Labs +name: efficient_mlm_m0_20_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`efficient_mlm_m0_20_pipeline` is a English model originally trained by princeton-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_20_pipeline_en_5.5.0_3.0_1726334876068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_20_pipeline_en_5.5.0_3.0_1726334876068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("efficient_mlm_m0_20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("efficient_mlm_m0_20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|efficient_mlm_m0_20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|842.8 MB| + +## References + +https://huggingface.co/princeton-nlp/efficient_mlm_m0.20 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-english_vietnamese_translation_en.md b/docs/_posts/ahmedlone127/2024-09-14-english_vietnamese_translation_en.md new file mode 100644 index 00000000000000..8314c7aeda7939 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-english_vietnamese_translation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English english_vietnamese_translation MarianTransformer from lainguyen +author: John Snow Labs +name: english_vietnamese_translation +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`english_vietnamese_translation` is a English model originally trained by lainguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/english_vietnamese_translation_en_5.5.0_3.0_1726351574664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/english_vietnamese_translation_en_5.5.0_3.0_1726351574664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("english_vietnamese_translation","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("english_vietnamese_translation","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|english_vietnamese_translation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|474.4 MB| + +## References + +https://huggingface.co/lainguyen/en_vi_translation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-faiedrobertabasef_en.md b/docs/_posts/ahmedlone127/2024-09-14-faiedrobertabasef_en.md new file mode 100644 index 00000000000000..11775be57bbedb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-faiedrobertabasef_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English faiedrobertabasef RoBertaForQuestionAnswering from adamfendri +author: John Snow Labs +name: faiedrobertabasef +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`faiedrobertabasef` is a English model originally trained by adamfendri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/faiedrobertabasef_en_5.5.0_3.0_1726343226743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/faiedrobertabasef_en_5.5.0_3.0_1726343226743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("faiedrobertabasef","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("faiedrobertabasef", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|faiedrobertabasef| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.6 MB| + +## References + +https://huggingface.co/adamfendri/FaiedrobertaBaseF \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_en.md b/docs/_posts/ahmedlone127/2024-09-14-finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_en.md new file mode 100644 index 00000000000000..e25d7831515061 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k MarianTransformer from HugginJake +author: John Snow Labs +name: finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k` is a English model originally trained by HugginJake. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_en_5.5.0_3.0_1726350548710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_en_5.5.0_3.0_1726350548710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|509.5 MB| + +## References + +https://huggingface.co/HugginJake/Finetuned_MarianMTModel_v4_specialFrom_ccmatrix77k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_pipeline_en.md new file mode 100644 index 00000000000000..55a96c476f4c45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_pipeline pipeline MarianTransformer from HugginJake +author: John Snow Labs +name: finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_pipeline` is a English model originally trained by HugginJake. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_pipeline_en_5.5.0_3.0_1726350575974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_pipeline_en_5.5.0_3.0_1726350575974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_marianmtmodel_v4_specialfrom_ccmatrix77k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|510.1 MB| + +## References + +https://huggingface.co/HugginJake/Finetuned_MarianMTModel_v4_specialFrom_ccmatrix77k + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-finroberta_en.md b/docs/_posts/ahmedlone127/2024-09-14-finroberta_en.md new file mode 100644 index 00000000000000..000a2c4c96029e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-finroberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finroberta RoBertaEmbeddings from sm6342 +author: John Snow Labs +name: finroberta +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finroberta` is a English model originally trained by sm6342. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finroberta_en_5.5.0_3.0_1726300120854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finroberta_en_5.5.0_3.0_1726300120854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("finroberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("finroberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finroberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.7 MB| + +## References + +https://huggingface.co/sm6342/FinRoberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-finroberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-finroberta_pipeline_en.md new file mode 100644 index 00000000000000..61670ed019aff9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-finroberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finroberta_pipeline pipeline RoBertaEmbeddings from sm6342 +author: John Snow Labs +name: finroberta_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finroberta_pipeline` is a English model originally trained by sm6342. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finroberta_pipeline_en_5.5.0_3.0_1726300135062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finroberta_pipeline_en_5.5.0_3.0_1726300135062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finroberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finroberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.7 MB| + +## References + +https://huggingface.co/sm6342/FinRoberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-formalrobertaaa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-formalrobertaaa_pipeline_en.md new file mode 100644 index 00000000000000..fac1b60bb754d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-formalrobertaaa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English formalrobertaaa_pipeline pipeline RoBertaEmbeddings from BigSalmon +author: John Snow Labs +name: formalrobertaaa_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`formalrobertaaa_pipeline` is a English model originally trained by BigSalmon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/formalrobertaaa_pipeline_en_5.5.0_3.0_1726334888885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/formalrobertaaa_pipeline_en_5.5.0_3.0_1726334888885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("formalrobertaaa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("formalrobertaaa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|formalrobertaaa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/BigSalmon/FormalRobertaaa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-gal_ner_iw_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-gal_ner_iw_4_pipeline_en.md new file mode 100644 index 00000000000000..6b4055595bb9a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-gal_ner_iw_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gal_ner_iw_4_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_ner_iw_4_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ner_iw_4_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ner_iw_4_pipeline_en_5.5.0_3.0_1726292043271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ner_iw_4_pipeline_en_5.5.0_3.0_1726292043271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_ner_iw_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_ner_iw_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ner_iw_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|419.5 MB| + +## References + +https://huggingface.co/homersimpson/gal-ner-iw-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-gischat_weibo_100k_fine_tuned_bert_zh.md b/docs/_posts/ahmedlone127/2024-09-14-gischat_weibo_100k_fine_tuned_bert_zh.md new file mode 100644 index 00000000000000..f59607521053dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-gischat_weibo_100k_fine_tuned_bert_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese gischat_weibo_100k_fine_tuned_bert BertForSequenceClassification from wsqstar +author: John Snow Labs +name: gischat_weibo_100k_fine_tuned_bert +date: 2024-09-14 +tags: [zh, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gischat_weibo_100k_fine_tuned_bert` is a Chinese model originally trained by wsqstar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gischat_weibo_100k_fine_tuned_bert_zh_5.5.0_3.0_1726348344308.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gischat_weibo_100k_fine_tuned_bert_zh_5.5.0_3.0_1726348344308.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("gischat_weibo_100k_fine_tuned_bert","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("gischat_weibo_100k_fine_tuned_bert", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gischat_weibo_100k_fine_tuned_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|zh| +|Size:|383.3 MB| + +## References + +https://huggingface.co/wsqstar/GISchat-weibo-100k-fine-tuned-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-hooshvarelab_roberta_persian_farsi_zwnj_base_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-14-hooshvarelab_roberta_persian_farsi_zwnj_base_pipeline_fa.md new file mode 100644 index 00000000000000..0c45f69c0b2522 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-hooshvarelab_roberta_persian_farsi_zwnj_base_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian hooshvarelab_roberta_persian_farsi_zwnj_base_pipeline pipeline RoBertaForTokenClassification from PerSpaCor +author: John Snow Labs +name: hooshvarelab_roberta_persian_farsi_zwnj_base_pipeline +date: 2024-09-14 +tags: [fa, open_source, pipeline, onnx] +task: Named Entity Recognition +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hooshvarelab_roberta_persian_farsi_zwnj_base_pipeline` is a Persian model originally trained by PerSpaCor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hooshvarelab_roberta_persian_farsi_zwnj_base_pipeline_fa_5.5.0_3.0_1726314945838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hooshvarelab_roberta_persian_farsi_zwnj_base_pipeline_fa_5.5.0_3.0_1726314945838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hooshvarelab_roberta_persian_farsi_zwnj_base_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hooshvarelab_roberta_persian_farsi_zwnj_base_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hooshvarelab_roberta_persian_farsi_zwnj_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|442.1 MB| + +## References + +https://huggingface.co/PerSpaCor/HooshvareLab-roberta-fa-zwnj-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-hw_slots_atis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-hw_slots_atis_pipeline_en.md new file mode 100644 index 00000000000000..aceb4689f9a0d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-hw_slots_atis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hw_slots_atis_pipeline pipeline RoBertaForTokenClassification from RaushanTurganbay +author: John Snow Labs +name: hw_slots_atis_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw_slots_atis_pipeline` is a English model originally trained by RaushanTurganbay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw_slots_atis_pipeline_en_5.5.0_3.0_1726306801237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw_slots_atis_pipeline_en_5.5.0_3.0_1726306801237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hw_slots_atis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hw_slots_atis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw_slots_atis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|420.8 MB| + +## References + +https://huggingface.co/RaushanTurganbay/hw-slots-atis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-intent_identifier_13_en.md b/docs/_posts/ahmedlone127/2024-09-14-intent_identifier_13_en.md new file mode 100644 index 00000000000000..e872cc3d265165 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-intent_identifier_13_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English intent_identifier_13 BertForSequenceClassification from dotzero24 +author: John Snow Labs +name: intent_identifier_13 +date: 2024-09-14 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`intent_identifier_13` is a English model originally trained by dotzero24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/intent_identifier_13_en_5.5.0_3.0_1726347974579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/intent_identifier_13_en_5.5.0_3.0_1726347974579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("intent_identifier_13","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("intent_identifier_13", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|intent_identifier_13| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|627.8 MB| + +## References + +https://huggingface.co/dotzero24/intent_identifier-13 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-iwslt17_marian_small_ctx0_cwd0_english_french_en.md b/docs/_posts/ahmedlone127/2024-09-14-iwslt17_marian_small_ctx0_cwd0_english_french_en.md new file mode 100644 index 00000000000000..2ee5cac90c7170 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-iwslt17_marian_small_ctx0_cwd0_english_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English iwslt17_marian_small_ctx0_cwd0_english_french MarianTransformer from context-mt +author: John Snow Labs +name: iwslt17_marian_small_ctx0_cwd0_english_french +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`iwslt17_marian_small_ctx0_cwd0_english_french` is a English model originally trained by context-mt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/iwslt17_marian_small_ctx0_cwd0_english_french_en_5.5.0_3.0_1726351456520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/iwslt17_marian_small_ctx0_cwd0_english_french_en_5.5.0_3.0_1726351456520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("iwslt17_marian_small_ctx0_cwd0_english_french","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("iwslt17_marian_small_ctx0_cwd0_english_french","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|iwslt17_marian_small_ctx0_cwd0_english_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.3 MB| + +## References + +https://huggingface.co/context-mt/iwslt17-marian-small-ctx0-cwd0-en-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-java_javascript_modelv3_en.md b/docs/_posts/ahmedlone127/2024-09-14-java_javascript_modelv3_en.md new file mode 100644 index 00000000000000..02b890f61a898c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-java_javascript_modelv3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English java_javascript_modelv3 RoBertaEmbeddings from dilanjt +author: John Snow Labs +name: java_javascript_modelv3 +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`java_javascript_modelv3` is a English model originally trained by dilanjt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/java_javascript_modelv3_en_5.5.0_3.0_1726338455619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/java_javascript_modelv3_en_5.5.0_3.0_1726338455619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("java_javascript_modelv3","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("java_javascript_modelv3","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|java_javascript_modelv3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/dilanjt/java-javascript-modelv3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-luxembert_v2_en.md b/docs/_posts/ahmedlone127/2024-09-14-luxembert_v2_en.md new file mode 100644 index 00000000000000..d69d3d8249a9a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-luxembert_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English luxembert_v2 BertEmbeddings from iolariu +author: John Snow Labs +name: luxembert_v2 +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`luxembert_v2` is a English model originally trained by iolariu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/luxembert_v2_en_5.5.0_3.0_1726312793591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/luxembert_v2_en_5.5.0_3.0_1726312793591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("luxembert_v2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("luxembert_v2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|luxembert_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/iolariu/LuxemBERT-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-mabel_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-14-mabel_roberta_large_en.md new file mode 100644 index 00000000000000..4c4b4aeca2c566 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-mabel_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mabel_roberta_large RoBertaEmbeddings from princeton-nlp +author: John Snow Labs +name: mabel_roberta_large +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mabel_roberta_large` is a English model originally trained by princeton-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mabel_roberta_large_en_5.5.0_3.0_1726300466737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mabel_roberta_large_en_5.5.0_3.0_1726300466737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mabel_roberta_large","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mabel_roberta_large","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mabel_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/princeton-nlp/mabel-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-mabel_roberta_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-mabel_roberta_large_pipeline_en.md new file mode 100644 index 00000000000000..050c4d2e1b1dd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-mabel_roberta_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mabel_roberta_large_pipeline pipeline RoBertaEmbeddings from princeton-nlp +author: John Snow Labs +name: mabel_roberta_large_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mabel_roberta_large_pipeline` is a English model originally trained by princeton-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mabel_roberta_large_pipeline_en_5.5.0_3.0_1726300525329.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mabel_roberta_large_pipeline_en_5.5.0_3.0_1726300525329.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mabel_roberta_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mabel_roberta_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mabel_roberta_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/princeton-nlp/mabel-roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-marathi_sentiment_tweets_pipeline_mr.md b/docs/_posts/ahmedlone127/2024-09-14-marathi_sentiment_tweets_pipeline_mr.md new file mode 100644 index 00000000000000..37dc3d36604f7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-marathi_sentiment_tweets_pipeline_mr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Marathi marathi_sentiment_tweets_pipeline pipeline BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: marathi_sentiment_tweets_pipeline +date: 2024-09-14 +tags: [mr, open_source, pipeline, onnx] +task: Text Classification +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_sentiment_tweets_pipeline` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_sentiment_tweets_pipeline_mr_5.5.0_3.0_1726348348910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_sentiment_tweets_pipeline_mr_5.5.0_3.0_1726348348910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marathi_sentiment_tweets_pipeline", lang = "mr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marathi_sentiment_tweets_pipeline", lang = "mr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_sentiment_tweets_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mr| +|Size:|892.9 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-sentiment-tweets + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_cwtmyd_en.md b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_cwtmyd_en.md new file mode 100644 index 00000000000000..5975a810eb8e48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_cwtmyd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_cwtmyd MarianTransformer from cwtmyd +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_cwtmyd +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_cwtmyd` is a English model originally trained by cwtmyd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_cwtmyd_en_5.5.0_3.0_1726351407503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_cwtmyd_en_5.5.0_3.0_1726351407503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_cwtmyd","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_cwtmyd","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_cwtmyd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/cwtmyd/marian-finetuned-kde4-en-to-fr-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_huggingface_course_en.md b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_huggingface_course_en.md new file mode 100644 index 00000000000000..72901c6c836a26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_huggingface_course_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_huggingface_course MarianTransformer from huggingface-course +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_huggingface_course +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_huggingface_course` is a English model originally trained by huggingface-course. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_huggingface_course_en_5.5.0_3.0_1726350548751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_huggingface_course_en_5.5.0_3.0_1726350548751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_huggingface_course","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_huggingface_course","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_huggingface_course| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/huggingface-course/marian-finetuned-kde4-en-to-fr-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_ahmed_abdeen_en.md b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_ahmed_abdeen_en.md new file mode 100644 index 00000000000000..d7e11c64c42e13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_ahmed_abdeen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_ahmed_abdeen MarianTransformer from Ahmed-Abdeen +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_ahmed_abdeen +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_ahmed_abdeen` is a English model originally trained by Ahmed-Abdeen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ahmed_abdeen_en_5.5.0_3.0_1726351234666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ahmed_abdeen_en_5.5.0_3.0_1726351234666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_ahmed_abdeen","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_ahmed_abdeen","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_ahmed_abdeen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.5 MB| + +## References + +https://huggingface.co/Ahmed-Abdeen/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_en.md b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_en.md new file mode 100644 index 00000000000000..bad48fd8723c0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11 MarianTransformer from artvandelay11 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11 +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11` is a English model originally trained by artvandelay11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_en_5.5.0_3.0_1726351614376.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_en_5.5.0_3.0_1726351614376.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/artvandelay11/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_pipeline_en.md new file mode 100644 index 00000000000000..a13860804f4444 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_pipeline pipeline MarianTransformer from artvandelay11 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_pipeline` is a English model originally trained by artvandelay11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_pipeline_en_5.5.0_3.0_1726351637600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_pipeline_en_5.5.0_3.0_1726351637600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_artvandelay11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/artvandelay11/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_muyuanma_en.md b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_muyuanma_en.md new file mode 100644 index 00000000000000..6184ea61d33717 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_muyuanma_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_muyuanma MarianTransformer from muyuanma +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_muyuanma +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_muyuanma` is a English model originally trained by muyuanma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_muyuanma_en_5.5.0_3.0_1726351151211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_muyuanma_en_5.5.0_3.0_1726351151211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_muyuanma","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_muyuanma","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_muyuanma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/muyuanma/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-medical_english_chinese_9_1_pt2_en.md b/docs/_posts/ahmedlone127/2024-09-14-medical_english_chinese_9_1_pt2_en.md new file mode 100644 index 00000000000000..335fc73bf2436a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-medical_english_chinese_9_1_pt2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English medical_english_chinese_9_1_pt2 MarianTransformer from DogGoesBark +author: John Snow Labs +name: medical_english_chinese_9_1_pt2 +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medical_english_chinese_9_1_pt2` is a English model originally trained by DogGoesBark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medical_english_chinese_9_1_pt2_en_5.5.0_3.0_1726351395558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medical_english_chinese_9_1_pt2_en_5.5.0_3.0_1726351395558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("medical_english_chinese_9_1_pt2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("medical_english_chinese_9_1_pt2","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medical_english_chinese_9_1_pt2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|541.4 MB| + +## References + +https://huggingface.co/DogGoesBark/medical_en_zh_9_1_pt2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-medical_tiny_english_1_0v_en.md b/docs/_posts/ahmedlone127/2024-09-14-medical_tiny_english_1_0v_en.md new file mode 100644 index 00000000000000..1d9850370ad129 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-medical_tiny_english_1_0v_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English medical_tiny_english_1_0v WhisperForCTC from Dev372 +author: John Snow Labs +name: medical_tiny_english_1_0v +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medical_tiny_english_1_0v` is a English model originally trained by Dev372. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medical_tiny_english_1_0v_en_5.5.0_3.0_1726298904873.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medical_tiny_english_1_0v_en_5.5.0_3.0_1726298904873.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("medical_tiny_english_1_0v","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("medical_tiny_english_1_0v", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medical_tiny_english_1_0v| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|394.0 MB| + +## References + +https://huggingface.co/Dev372/Medical_tiny_en_1_0v \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-memo_bert_sanskrit_saskta_danskbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-memo_bert_sanskrit_saskta_danskbert_pipeline_en.md new file mode 100644 index 00000000000000..2cc5200be59dbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-memo_bert_sanskrit_saskta_danskbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English memo_bert_sanskrit_saskta_danskbert_pipeline pipeline XlmRoBertaForSequenceClassification from yemen2016 +author: John Snow Labs +name: memo_bert_sanskrit_saskta_danskbert_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`memo_bert_sanskrit_saskta_danskbert_pipeline` is a English model originally trained by yemen2016. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/memo_bert_sanskrit_saskta_danskbert_pipeline_en_5.5.0_3.0_1726318526082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/memo_bert_sanskrit_saskta_danskbert_pipeline_en_5.5.0_3.0_1726318526082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("memo_bert_sanskrit_saskta_danskbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("memo_bert_sanskrit_saskta_danskbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|memo_bert_sanskrit_saskta_danskbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.6 MB| + +## References + +https://huggingface.co/yemen2016/MeMo_BERT-SA_DanskBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-model_albert_20_en.md b/docs/_posts/ahmedlone127/2024-09-14-model_albert_20_en.md new file mode 100644 index 00000000000000..67d69c6e442cea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-model_albert_20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_albert_20 AlbertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: model_albert_20 +date: 2024-09-14 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_albert_20` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_albert_20_en_5.5.0_3.0_1726315482534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_albert_20_en_5.5.0_3.0_1726315482534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("model_albert_20","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("model_albert_20", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_albert_20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/KalaiselvanD/model_albert_20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-model_for_french_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-model_for_french_pipeline_en.md new file mode 100644 index 00000000000000..83cf07edc991fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-model_for_french_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_for_french_pipeline pipeline XlmRoBertaForTokenClassification from LGLT +author: John Snow Labs +name: model_for_french_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_for_french_pipeline` is a English model originally trained by LGLT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_for_french_pipeline_en_5.5.0_3.0_1726346028955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_for_french_pipeline_en_5.5.0_3.0_1726346028955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_for_french_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_for_french_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_for_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|781.6 MB| + +## References + +https://huggingface.co/LGLT/model_for_fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_minzzi_en.md b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_minzzi_en.md new file mode 100644 index 00000000000000..2d574d20ae03c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_minzzi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_minzzi MarianTransformer from minzzi +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_minzzi +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_minzzi` is a English model originally trained by minzzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_minzzi_en_5.5.0_3.0_1726350546271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_minzzi_en_5.5.0_3.0_1726350546271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_minzzi","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_minzzi","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_minzzi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.5 MB| + +## References + +https://huggingface.co/minzzi/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_en.md b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_en.md new file mode 100644 index 00000000000000..0e32c4f4dbe8e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123 MarianTransformer from soulking123 +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123 +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123` is a English model originally trained by soulking123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_en_5.5.0_3.0_1726350915095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_en_5.5.0_3.0_1726350915095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.7 MB| + +## References + +https://huggingface.co/soulking123/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_pipeline_en.md new file mode 100644 index 00000000000000..19e6524da18f63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_pipeline pipeline MarianTransformer from soulking123 +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_pipeline` is a English model originally trained by soulking123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_pipeline_en_5.5.0_3.0_1726350940079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_pipeline_en_5.5.0_3.0_1726350940079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_soulking123_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.2 MB| + +## References + +https://huggingface.co/soulking123/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_indonesian_english_jakarta_best_loss_bleu_en.md b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_indonesian_english_jakarta_best_loss_bleu_en.md new file mode 100644 index 00000000000000..ffb029d8670f40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_indonesian_english_jakarta_best_loss_bleu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_indonesian_english_jakarta_best_loss_bleu MarianTransformer from yonathanstwn +author: John Snow Labs +name: opus_maltese_indonesian_english_jakarta_best_loss_bleu +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_indonesian_english_jakarta_best_loss_bleu` is a English model originally trained by yonathanstwn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_indonesian_english_jakarta_best_loss_bleu_en_5.5.0_3.0_1726351668227.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_indonesian_english_jakarta_best_loss_bleu_en_5.5.0_3.0_1726351668227.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_indonesian_english_jakarta_best_loss_bleu","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_indonesian_english_jakarta_best_loss_bleu","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_indonesian_english_jakarta_best_loss_bleu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|480.1 MB| + +## References + +https://huggingface.co/yonathanstwn/opus-mt-id-en-jakarta-best-loss-bleu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_italian_english_bds_en.md b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_italian_english_bds_en.md new file mode 100644 index 00000000000000..407db13fd71635 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_italian_english_bds_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_italian_english_bds MarianTransformer from Anhptp +author: John Snow Labs +name: opus_maltese_italian_english_bds +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_italian_english_bds` is a English model originally trained by Anhptp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_italian_english_bds_en_5.5.0_3.0_1726351758278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_italian_english_bds_en_5.5.0_3.0_1726351758278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_italian_english_bds","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_italian_english_bds","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_italian_english_bds| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|625.0 MB| + +## References + +https://huggingface.co/Anhptp/opus-mt-it-en-BDS \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-output_mask_step_pretraining_plus_contr_roberta_model_from_pretrained_large_both_epochs_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-output_mask_step_pretraining_plus_contr_roberta_model_from_pretrained_large_both_epochs_1_pipeline_en.md new file mode 100644 index 00000000000000..2ed02627c9e83e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-output_mask_step_pretraining_plus_contr_roberta_model_from_pretrained_large_both_epochs_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English output_mask_step_pretraining_plus_contr_roberta_model_from_pretrained_large_both_epochs_1_pipeline pipeline RoBertaForQuestionAnswering from AnonymousSub +author: John Snow Labs +name: output_mask_step_pretraining_plus_contr_roberta_model_from_pretrained_large_both_epochs_1_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`output_mask_step_pretraining_plus_contr_roberta_model_from_pretrained_large_both_epochs_1_pipeline` is a English model originally trained by AnonymousSub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/output_mask_step_pretraining_plus_contr_roberta_model_from_pretrained_large_both_epochs_1_pipeline_en_5.5.0_3.0_1726342958589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/output_mask_step_pretraining_plus_contr_roberta_model_from_pretrained_large_both_epochs_1_pipeline_en_5.5.0_3.0_1726342958589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("output_mask_step_pretraining_plus_contr_roberta_model_from_pretrained_large_both_epochs_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("output_mask_step_pretraining_plus_contr_roberta_model_from_pretrained_large_both_epochs_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|output_mask_step_pretraining_plus_contr_roberta_model_from_pretrained_large_both_epochs_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/AnonymousSub/output_mask_step_pretraining_plus_contr_roberta_model_from_pretrained_LARGE_BOTH_EPOCHS_1 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-parliament_ner_en.md b/docs/_posts/ahmedlone127/2024-09-14-parliament_ner_en.md new file mode 100644 index 00000000000000..cf9383b4123540 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-parliament_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English parliament_ner RoBertaForTokenClassification from matthewleechen +author: John Snow Labs +name: parliament_ner +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`parliament_ner` is a English model originally trained by matthewleechen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/parliament_ner_en_5.5.0_3.0_1726314801416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/parliament_ner_en_5.5.0_3.0_1726314801416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("parliament_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("parliament_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|parliament_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/matthewleechen/parliament_ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-personal_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-14-personal_pipeline_es.md new file mode 100644 index 00000000000000..eb3337a3ee4a1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-personal_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish personal_pipeline pipeline BertForQuestionAnswering from Antonio49 +author: John Snow Labs +name: personal_pipeline +date: 2024-09-14 +tags: [es, open_source, pipeline, onnx] +task: Question Answering +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`personal_pipeline` is a Castilian, Spanish model originally trained by Antonio49. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/personal_pipeline_es_5.5.0_3.0_1726349850093.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/personal_pipeline_es_5.5.0_3.0_1726349850093.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("personal_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("personal_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|personal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|409.7 MB| + +## References + +https://huggingface.co/Antonio49/Personal + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-pig_latin_tonga_tonga_islands_eng_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-pig_latin_tonga_tonga_islands_eng_pipeline_en.md new file mode 100644 index 00000000000000..00f2d817313c6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-pig_latin_tonga_tonga_islands_eng_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pig_latin_tonga_tonga_islands_eng_pipeline pipeline MarianTransformer from soschuetze +author: John Snow Labs +name: pig_latin_tonga_tonga_islands_eng_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pig_latin_tonga_tonga_islands_eng_pipeline` is a English model originally trained by soschuetze. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pig_latin_tonga_tonga_islands_eng_pipeline_en_5.5.0_3.0_1726350941423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pig_latin_tonga_tonga_islands_eng_pipeline_en_5.5.0_3.0_1726350941423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pig_latin_tonga_tonga_islands_eng_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pig_latin_tonga_tonga_islands_eng_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pig_latin_tonga_tonga_islands_eng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|533.2 MB| + +## References + +https://huggingface.co/soschuetze/pig-latin-to-eng + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-pipeline1model1_en.md b/docs/_posts/ahmedlone127/2024-09-14-pipeline1model1_en.md new file mode 100644 index 00000000000000..a777a41036dbe5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-pipeline1model1_en.md @@ -0,0 +1,66 @@ +--- +layout: model +title: English pipeline1model1 pipeline WhisperForCTC from avery0 +author: John Snow Labs +name: pipeline1model1 +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pipeline1model1` is a English model originally trained by avery0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline1model1_en_5.5.0_3.0_1726284046766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pipeline1model1_en_5.5.0_3.0_1726284046766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pipeline1model1", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pipeline1model1", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pipeline1model1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|645.3 MB| + +## References + +https://huggingface.co/avery0/pipeline1model1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-pipeline1model1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-pipeline1model1_pipeline_en.md new file mode 100644 index 00000000000000..740e17c1cee3d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-pipeline1model1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English pipeline1model1_pipeline pipeline WhisperForCTC from avery0 +author: John Snow Labs +name: pipeline1model1_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pipeline1model1_pipeline` is a English model originally trained by avery0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline1model1_pipeline_en_5.5.0_3.0_1726284077779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pipeline1model1_pipeline_en_5.5.0_3.0_1726284077779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pipeline1model1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pipeline1model1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pipeline1model1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|645.3 MB| + +## References + +https://huggingface.co/avery0/pipeline1model1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-polish_roberta_base_v1_pl.md b/docs/_posts/ahmedlone127/2024-09-14-polish_roberta_base_v1_pl.md new file mode 100644 index 00000000000000..41387d040a904c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-polish_roberta_base_v1_pl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Polish polish_roberta_base_v1 RoBertaEmbeddings from sdadas +author: John Snow Labs +name: polish_roberta_base_v1 +date: 2024-09-14 +tags: [pl, open_source, onnx, embeddings, roberta] +task: Embeddings +language: pl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polish_roberta_base_v1` is a Polish model originally trained by sdadas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polish_roberta_base_v1_pl_5.5.0_3.0_1726334169357.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polish_roberta_base_v1_pl_5.5.0_3.0_1726334169357.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("polish_roberta_base_v1","pl") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("polish_roberta_base_v1","pl") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polish_roberta_base_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|pl| +|Size:|298.0 MB| + +## References + +https://huggingface.co/sdadas/polish-roberta-base-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-prags1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-prags1_pipeline_en.md new file mode 100644 index 00000000000000..d151213a84e99e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-prags1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English prags1_pipeline pipeline RoBertaEmbeddings from UBC-NLP +author: John Snow Labs +name: prags1_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prags1_pipeline` is a English model originally trained by UBC-NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prags1_pipeline_en_5.5.0_3.0_1726334506762.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prags1_pipeline_en_5.5.0_3.0_1726334506762.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("prags1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("prags1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prags1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/UBC-NLP/prags1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_base_bne_capitel_ner_plus_bsc_lt_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_bne_capitel_ner_plus_bsc_lt_pipeline_es.md new file mode 100644 index 00000000000000..8db8fe75c401a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_bne_capitel_ner_plus_bsc_lt_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish roberta_base_bne_capitel_ner_plus_bsc_lt_pipeline pipeline RoBertaForTokenClassification from BSC-LT +author: John Snow Labs +name: roberta_base_bne_capitel_ner_plus_bsc_lt_pipeline +date: 2024-09-14 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_capitel_ner_plus_bsc_lt_pipeline` is a Castilian, Spanish model originally trained by BSC-LT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_capitel_ner_plus_bsc_lt_pipeline_es_5.5.0_3.0_1726300961156.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_capitel_ner_plus_bsc_lt_pipeline_es_5.5.0_3.0_1726300961156.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_capitel_ner_plus_bsc_lt_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_capitel_ner_plus_bsc_lt_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_capitel_ner_plus_bsc_lt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|458.2 MB| + +## References + +https://huggingface.co/BSC-LT/roberta-base-bne-capitel-ner-plus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_base_scandinavian_da.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_scandinavian_da.md new file mode 100644 index 00000000000000..748259ce1ddc04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_scandinavian_da.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Danish roberta_base_scandinavian RoBertaEmbeddings from DDSC +author: John Snow Labs +name: roberta_base_scandinavian +date: 2024-09-14 +tags: [da, open_source, onnx, embeddings, roberta] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_scandinavian` is a Danish model originally trained by DDSC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_scandinavian_da_5.5.0_3.0_1726338012777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_scandinavian_da_5.5.0_3.0_1726338012777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_scandinavian","da") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_scandinavian","da") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_scandinavian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|da| +|Size:|466.3 MB| + +## References + +https://huggingface.co/DDSC/roberta-base-scandinavian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_base_squad2_train_data_unmodified_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_squad2_train_data_unmodified_en.md new file mode 100644 index 00000000000000..43db091234e58e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_squad2_train_data_unmodified_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_squad2_train_data_unmodified RoBertaForQuestionAnswering from mdzrg +author: John Snow Labs +name: roberta_base_squad2_train_data_unmodified +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_squad2_train_data_unmodified` is a English model originally trained by mdzrg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_squad2_train_data_unmodified_en_5.5.0_3.0_1726343216685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_squad2_train_data_unmodified_en_5.5.0_3.0_1726343216685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_squad2_train_data_unmodified","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_squad2_train_data_unmodified", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_squad2_train_data_unmodified| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.5 MB| + +## References + +https://huggingface.co/mdzrg/roberta-base-squad2-train-data-unmodified \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_cyner_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_cyner_en.md new file mode 100644 index 00000000000000..2a1601940d1f23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_cyner_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English roberta_cyner RoBertaForTokenClassification from anonymouspd +author: John Snow Labs +name: roberta_cyner +date: 2024-09-14 +tags: [roberta, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cyner` is a English model originally trained by anonymouspd. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cyner_en_5.5.0_3.0_1726306426126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cyner_en_5.5.0_3.0_1726306426126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols(["document"]) \ + .setOutputCol("token") + + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_cyner","en") \ + .setInputCols(["document","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = Tokenizer() \ + .setInputCols(Array("document")) \ + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification + .pretrained("roberta_cyner", "en") + .setInputCols(Array("document","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cyner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|425.9 MB| + +## References + +References + +https://huggingface.co/anonymouspd/RoBERTa-CyNER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_finetuned_subjqa_movies_2_skullwreker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_finetuned_subjqa_movies_2_skullwreker_pipeline_en.md new file mode 100644 index 00000000000000..2230470e92713f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_finetuned_subjqa_movies_2_skullwreker_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_skullwreker_pipeline pipeline RoBertaForQuestionAnswering from SkullWreker +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_skullwreker_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_skullwreker_pipeline` is a English model originally trained by SkullWreker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_skullwreker_pipeline_en_5.5.0_3.0_1726343180292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_skullwreker_pipeline_en_5.5.0_3.0_1726343180292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_subjqa_movies_2_skullwreker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_subjqa_movies_2_skullwreker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_skullwreker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/SkullWreker/roberta-finetuned-subjqa-movies_2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_indosquadv2_1691592486_16_2e_05_0_01_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_indosquadv2_1691592486_16_2e_05_0_01_5_pipeline_en.md new file mode 100644 index 00000000000000..4f17c2e05c64d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_indosquadv2_1691592486_16_2e_05_0_01_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_indosquadv2_1691592486_16_2e_05_0_01_5_pipeline pipeline RoBertaForQuestionAnswering from rizquuula +author: John Snow Labs +name: roberta_indosquadv2_1691592486_16_2e_05_0_01_5_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_indosquadv2_1691592486_16_2e_05_0_01_5_pipeline` is a English model originally trained by rizquuula. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_indosquadv2_1691592486_16_2e_05_0_01_5_pipeline_en_5.5.0_3.0_1726343111481.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_indosquadv2_1691592486_16_2e_05_0_01_5_pipeline_en_5.5.0_3.0_1726343111481.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_indosquadv2_1691592486_16_2e_05_0_01_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_indosquadv2_1691592486_16_2e_05_0_01_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_indosquadv2_1691592486_16_2e_05_0_01_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|454.5 MB| + +## References + +https://huggingface.co/rizquuula/RoBERTa-IndoSQuADv2_1691592486-16-2e-05-0.01-5 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_large_few_shot_k_64_finetuned_squad_seed_2_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_few_shot_k_64_finetuned_squad_seed_2_en.md new file mode 100644 index 00000000000000..71f9ac314b0493 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_few_shot_k_64_finetuned_squad_seed_2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_large_few_shot_k_64_finetuned_squad_seed_2 RoBertaForQuestionAnswering from anas-awadalla +author: John Snow Labs +name: roberta_large_few_shot_k_64_finetuned_squad_seed_2 +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_few_shot_k_64_finetuned_squad_seed_2` is a English model originally trained by anas-awadalla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_few_shot_k_64_finetuned_squad_seed_2_en_5.5.0_3.0_1726342758941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_few_shot_k_64_finetuned_squad_seed_2_en_5.5.0_3.0_1726342758941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_few_shot_k_64_finetuned_squad_seed_2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_few_shot_k_64_finetuned_squad_seed_2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_few_shot_k_64_finetuned_squad_seed_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/anas-awadalla/roberta-large-few-shot-k-64-finetuned-squad-seed-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_large_few_shot_k_64_finetuned_squad_seed_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_few_shot_k_64_finetuned_squad_seed_2_pipeline_en.md new file mode 100644 index 00000000000000..3acb3700f60da5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_few_shot_k_64_finetuned_squad_seed_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_large_few_shot_k_64_finetuned_squad_seed_2_pipeline pipeline RoBertaForQuestionAnswering from anas-awadalla +author: John Snow Labs +name: roberta_large_few_shot_k_64_finetuned_squad_seed_2_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_few_shot_k_64_finetuned_squad_seed_2_pipeline` is a English model originally trained by anas-awadalla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_few_shot_k_64_finetuned_squad_seed_2_pipeline_en_5.5.0_3.0_1726342841388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_few_shot_k_64_finetuned_squad_seed_2_pipeline_en_5.5.0_3.0_1726342841388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_few_shot_k_64_finetuned_squad_seed_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_few_shot_k_64_finetuned_squad_seed_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_few_shot_k_64_finetuned_squad_seed_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/anas-awadalla/roberta-large-few-shot-k-64-finetuned-squad-seed-2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_large_metaie_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_metaie_en.md new file mode 100644 index 00000000000000..06b86ddc762f8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_metaie_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_metaie RoBertaForTokenClassification from KomeijiForce +author: John Snow Labs +name: roberta_large_metaie +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_metaie` is a English model originally trained by KomeijiForce. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_metaie_en_5.5.0_3.0_1726307244599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_metaie_en_5.5.0_3.0_1726307244599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_metaie","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_metaie", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_metaie| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/KomeijiForce/roberta-large-metaie \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_large_metaie_gpt4_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_metaie_gpt4_en.md new file mode 100644 index 00000000000000..49582788ea1a6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_metaie_gpt4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_metaie_gpt4 RoBertaForTokenClassification from KomeijiForce +author: John Snow Labs +name: roberta_large_metaie_gpt4 +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_metaie_gpt4` is a English model originally trained by KomeijiForce. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_metaie_gpt4_en_5.5.0_3.0_1726307118206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_metaie_gpt4_en_5.5.0_3.0_1726307118206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_metaie_gpt4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_metaie_gpt4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_metaie_gpt4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/KomeijiForce/roberta-large-metaie-gpt4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_large_metaie_gpt4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_metaie_gpt4_pipeline_en.md new file mode 100644 index 00000000000000..cf4ff89fe07ed0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_metaie_gpt4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_metaie_gpt4_pipeline pipeline RoBertaForTokenClassification from KomeijiForce +author: John Snow Labs +name: roberta_large_metaie_gpt4_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_metaie_gpt4_pipeline` is a English model originally trained by KomeijiForce. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_metaie_gpt4_pipeline_en_5.5.0_3.0_1726307180992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_metaie_gpt4_pipeline_en_5.5.0_3.0_1726307180992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_metaie_gpt4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_metaie_gpt4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_metaie_gpt4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/KomeijiForce/roberta-large-metaie-gpt4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_large_switchboard_normalized_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_switchboard_normalized_en.md new file mode 100644 index 00000000000000..4183ba60280b0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_switchboard_normalized_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_switchboard_normalized RoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: roberta_large_switchboard_normalized +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_switchboard_normalized` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_switchboard_normalized_en_5.5.0_3.0_1726316188680.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_switchboard_normalized_en_5.5.0_3.0_1726316188680.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_switchboard_normalized","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_switchboard_normalized", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_switchboard_normalized| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/anonymoussubmissions/roberta-large-switchboard-normalized \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_spanish_clinical_trials_attributes_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_spanish_clinical_trials_attributes_ner_pipeline_en.md new file mode 100644 index 00000000000000..b772fd4ba3ba88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_spanish_clinical_trials_attributes_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_spanish_clinical_trials_attributes_ner_pipeline pipeline RoBertaForTokenClassification from medspaner +author: John Snow Labs +name: roberta_spanish_clinical_trials_attributes_ner_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_spanish_clinical_trials_attributes_ner_pipeline` is a English model originally trained by medspaner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_spanish_clinical_trials_attributes_ner_pipeline_en_5.5.0_3.0_1726316298860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_spanish_clinical_trials_attributes_ner_pipeline_en_5.5.0_3.0_1726316298860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_spanish_clinical_trials_attributes_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_spanish_clinical_trials_attributes_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_spanish_clinical_trials_attributes_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.5 MB| + +## References + +https://huggingface.co/medspaner/roberta-es-clinical-trials-attributes-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-rubert_multiconer_ru.md b/docs/_posts/ahmedlone127/2024-09-14-rubert_multiconer_ru.md new file mode 100644 index 00000000000000..995383ac15e4c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-rubert_multiconer_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian rubert_multiconer BertForTokenClassification from bond005 +author: John Snow Labs +name: rubert_multiconer +date: 2024-09-14 +tags: [ru, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_multiconer` is a Russian model originally trained by bond005. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_multiconer_ru_5.5.0_3.0_1726305575834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_multiconer_ru_5.5.0_3.0_1726305575834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("rubert_multiconer","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("rubert_multiconer", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_multiconer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ru| +|Size:|664.3 MB| + +## References + +https://huggingface.co/bond005/rubert-multiconer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sent_beto_chile_politico_1990_2019_es.md b/docs/_posts/ahmedlone127/2024-09-14-sent_beto_chile_politico_1990_2019_es.md new file mode 100644 index 00000000000000..4e674c88b7dded --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sent_beto_chile_politico_1990_2019_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish sent_beto_chile_politico_1990_2019 BertSentenceEmbeddings from lucas-valenzuela-everke +author: John Snow Labs +name: sent_beto_chile_politico_1990_2019 +date: 2024-09-14 +tags: [es, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_beto_chile_politico_1990_2019` is a Castilian, Spanish model originally trained by lucas-valenzuela-everke. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_beto_chile_politico_1990_2019_es_5.5.0_3.0_1726320136790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_beto_chile_politico_1990_2019_es_5.5.0_3.0_1726320136790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_beto_chile_politico_1990_2019","es") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_beto_chile_politico_1990_2019","es") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_beto_chile_politico_1990_2019| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|es| +|Size:|409.7 MB| + +## References + +https://huggingface.co/lucas-valenzuela-everke/BETO-chile-politico-1990-2019 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sent_guaran_bert_large_cased_gn.md b/docs/_posts/ahmedlone127/2024-09-14-sent_guaran_bert_large_cased_gn.md new file mode 100644 index 00000000000000..2e005cf69a6023 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sent_guaran_bert_large_cased_gn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Guarani sent_guaran_bert_large_cased BertSentenceEmbeddings from mmaguero +author: John Snow Labs +name: sent_guaran_bert_large_cased +date: 2024-09-14 +tags: [gn, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: gn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_guaran_bert_large_cased` is a Guarani model originally trained by mmaguero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_guaran_bert_large_cased_gn_5.5.0_3.0_1726320664800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_guaran_bert_large_cased_gn_5.5.0_3.0_1726320664800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_guaran_bert_large_cased","gn") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_guaran_bert_large_cased","gn") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_guaran_bert_large_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|gn| +|Size:|1.2 GB| + +## References + +https://huggingface.co/mmaguero/gn-bert-large-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sent_hinglish_sbert_en.md b/docs/_posts/ahmedlone127/2024-09-14-sent_hinglish_sbert_en.md new file mode 100644 index 00000000000000..9f04eca6c16701 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sent_hinglish_sbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_hinglish_sbert BertSentenceEmbeddings from meghanabhange +author: John Snow Labs +name: sent_hinglish_sbert +date: 2024-09-14 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hinglish_sbert` is a English model originally trained by meghanabhange. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hinglish_sbert_en_5.5.0_3.0_1726337218270.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hinglish_sbert_en_5.5.0_3.0_1726337218270.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hinglish_sbert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hinglish_sbert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hinglish_sbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.8 MB| + +## References + +https://huggingface.co/meghanabhange/hinglish-sbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sent_indojave_codemixed_indobertweet_base_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-14-sent_indojave_codemixed_indobertweet_base_pipeline_id.md new file mode 100644 index 00000000000000..43097402e3f2fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sent_indojave_codemixed_indobertweet_base_pipeline_id.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Indonesian sent_indojave_codemixed_indobertweet_base_pipeline pipeline BertSentenceEmbeddings from fathan +author: John Snow Labs +name: sent_indojave_codemixed_indobertweet_base_pipeline +date: 2024-09-14 +tags: [id, open_source, pipeline, onnx] +task: Embeddings +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_indojave_codemixed_indobertweet_base_pipeline` is a Indonesian model originally trained by fathan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_indojave_codemixed_indobertweet_base_pipeline_id_5.5.0_3.0_1726320282334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_indojave_codemixed_indobertweet_base_pipeline_id_5.5.0_3.0_1726320282334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_indojave_codemixed_indobertweet_base_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_indojave_codemixed_indobertweet_base_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_indojave_codemixed_indobertweet_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|412.4 MB| + +## References + +https://huggingface.co/fathan/indojave-codemixed-indobertweet-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sent_legal_bertimbau_large_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-14-sent_legal_bertimbau_large_pipeline_pt.md new file mode 100644 index 00000000000000..934b9f8ab87748 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sent_legal_bertimbau_large_pipeline_pt.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Portuguese sent_legal_bertimbau_large_pipeline pipeline BertSentenceEmbeddings from rufimelo +author: John Snow Labs +name: sent_legal_bertimbau_large_pipeline +date: 2024-09-14 +tags: [pt, open_source, pipeline, onnx] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_legal_bertimbau_large_pipeline` is a Portuguese model originally trained by rufimelo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_legal_bertimbau_large_pipeline_pt_5.5.0_3.0_1726337647305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_legal_bertimbau_large_pipeline_pt_5.5.0_3.0_1726337647305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_legal_bertimbau_large_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_legal_bertimbau_large_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_legal_bertimbau_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.2 GB| + +## References + +https://huggingface.co/rufimelo/Legal-BERTimbau-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-shami2english_en.md b/docs/_posts/ahmedlone127/2024-09-14-shami2english_en.md new file mode 100644 index 00000000000000..1c515c8b528154 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-shami2english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English shami2english MarianTransformer from guymorlan +author: John Snow Labs +name: shami2english +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`shami2english` is a English model originally trained by guymorlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/shami2english_en_5.5.0_3.0_1726351819294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/shami2english_en_5.5.0_3.0_1726351819294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("shami2english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("shami2english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|shami2english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|527.7 MB| + +## References + +https://huggingface.co/guymorlan/Shami2English \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-shami2english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-shami2english_pipeline_en.md new file mode 100644 index 00000000000000..ecb8de0b8e37c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-shami2english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English shami2english_pipeline pipeline MarianTransformer from guymorlan +author: John Snow Labs +name: shami2english_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`shami2english_pipeline` is a English model originally trained by guymorlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/shami2english_pipeline_en_5.5.0_3.0_1726351842700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/shami2english_pipeline_en_5.5.0_3.0_1726351842700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("shami2english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("shami2english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|shami2english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|528.2 MB| + +## References + +https://huggingface.co/guymorlan/Shami2English + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-small_finetuned_squadv2_en.md b/docs/_posts/ahmedlone127/2024-09-14-small_finetuned_squadv2_en.md new file mode 100644 index 00000000000000..483369d2c81c7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-small_finetuned_squadv2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English small_finetuned_squadv2 RoBertaForQuestionAnswering from dungquarkquark +author: John Snow Labs +name: small_finetuned_squadv2 +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`small_finetuned_squadv2` is a English model originally trained by dungquarkquark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/small_finetuned_squadv2_en_5.5.0_3.0_1726342659983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/small_finetuned_squadv2_en_5.5.0_3.0_1726342659983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("small_finetuned_squadv2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("small_finetuned_squadv2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|small_finetuned_squadv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.5 MB| + +## References + +https://huggingface.co/dungquarkquark/small-finetuned-squadv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-spark_name_french_tonga_tonga_islands_english_en.md b/docs/_posts/ahmedlone127/2024-09-14-spark_name_french_tonga_tonga_islands_english_en.md new file mode 100644 index 00000000000000..cdebfb7851add5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-spark_name_french_tonga_tonga_islands_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spark_name_french_tonga_tonga_islands_english MarianTransformer from ihebaker10 +author: John Snow Labs +name: spark_name_french_tonga_tonga_islands_english +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spark_name_french_tonga_tonga_islands_english` is a English model originally trained by ihebaker10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spark_name_french_tonga_tonga_islands_english_en_5.5.0_3.0_1726351410478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spark_name_french_tonga_tonga_islands_english_en_5.5.0_3.0_1726351410478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("spark_name_french_tonga_tonga_islands_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("spark_name_french_tonga_tonga_islands_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spark_name_french_tonga_tonga_islands_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/ihebaker10/spark-name-fr-to-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-stack_overflow_open_status_classifier_portuguese_warm_unsupervised_supervised_120_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-stack_overflow_open_status_classifier_portuguese_warm_unsupervised_supervised_120_pipeline_en.md new file mode 100644 index 00000000000000..8c205f51563d72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-stack_overflow_open_status_classifier_portuguese_warm_unsupervised_supervised_120_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stack_overflow_open_status_classifier_portuguese_warm_unsupervised_supervised_120_pipeline pipeline AlbertForSequenceClassification from reubenjohn +author: John Snow Labs +name: stack_overflow_open_status_classifier_portuguese_warm_unsupervised_supervised_120_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stack_overflow_open_status_classifier_portuguese_warm_unsupervised_supervised_120_pipeline` is a English model originally trained by reubenjohn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stack_overflow_open_status_classifier_portuguese_warm_unsupervised_supervised_120_pipeline_en_5.5.0_3.0_1726308617312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stack_overflow_open_status_classifier_portuguese_warm_unsupervised_supervised_120_pipeline_en_5.5.0_3.0_1726308617312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stack_overflow_open_status_classifier_portuguese_warm_unsupervised_supervised_120_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stack_overflow_open_status_classifier_portuguese_warm_unsupervised_supervised_120_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stack_overflow_open_status_classifier_portuguese_warm_unsupervised_supervised_120_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.3 MB| + +## References + +https://huggingface.co/reubenjohn/stack-overflow-open-status-classifier-pt-warm-unsupervised-supervised-120 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-tmp_trainer_en.md b/docs/_posts/ahmedlone127/2024-09-14-tmp_trainer_en.md new file mode 100644 index 00000000000000..34fe536dc7fbe9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-tmp_trainer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tmp_trainer DistilBertForTokenClassification from anyuanay +author: John Snow Labs +name: tmp_trainer +date: 2024-09-14 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tmp_trainer` is a English model originally trained by anyuanay. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tmp_trainer_en_5.5.0_3.0_1726350887864.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tmp_trainer_en_5.5.0_3.0_1726350887864.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = DistilBertForTokenClassification.pretrained("tmp_trainer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = DistilBertForTokenClassification + .pretrained("tmp_trainer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tmp_trainer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|530.2 MB| + +## References + +References + +References + +https://huggingface.co/anyuanay/tmp_trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-transcriber_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-transcriber_small_pipeline_en.md new file mode 100644 index 00000000000000..f711154f737687 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-transcriber_small_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English transcriber_small_pipeline pipeline WhisperForCTC from mediaProcessing +author: John Snow Labs +name: transcriber_small_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transcriber_small_pipeline` is a English model originally trained by mediaProcessing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transcriber_small_pipeline_en_5.5.0_3.0_1726331431503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transcriber_small_pipeline_en_5.5.0_3.0_1726331431503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("transcriber_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("transcriber_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transcriber_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mediaProcessing/Transcriber-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-tweets_financial_classifier_distilbert_dima806_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-tweets_financial_classifier_distilbert_dima806_pipeline_en.md new file mode 100644 index 00000000000000..4700fb6dcb6021 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-tweets_financial_classifier_distilbert_dima806_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tweets_financial_classifier_distilbert_dima806_pipeline pipeline BertForSequenceClassification from dima806 +author: John Snow Labs +name: tweets_financial_classifier_distilbert_dima806_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tweets_financial_classifier_distilbert_dima806_pipeline` is a English model originally trained by dima806. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tweets_financial_classifier_distilbert_dima806_pipeline_en_5.5.0_3.0_1726348239064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tweets_financial_classifier_distilbert_dima806_pipeline_en_5.5.0_3.0_1726348239064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tweets_financial_classifier_distilbert_dima806_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tweets_financial_classifier_distilbert_dima806_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tweets_financial_classifier_distilbert_dima806_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/dima806/tweets-financial-classifier-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-vispell_small_v2_vi.md b/docs/_posts/ahmedlone127/2024-09-14-vispell_small_v2_vi.md new file mode 100644 index 00000000000000..bab49cf78991b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-vispell_small_v2_vi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Vietnamese vispell_small_v2 MarianTransformer from ademax +author: John Snow Labs +name: vispell_small_v2 +date: 2024-09-14 +tags: [vi, open_source, onnx, translation, marian] +task: Translation +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`vispell_small_v2` is a Vietnamese model originally trained by ademax. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vispell_small_v2_vi_5.5.0_3.0_1726351248164.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/vispell_small_v2_vi_5.5.0_3.0_1726351248164.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("vispell_small_v2","vi") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("vispell_small_v2","vi") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|vispell_small_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|vi| +|Size:|425.9 MB| + +## References + +https://huggingface.co/ademax/vispell-small-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_anuj_small_malyalam_final_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_anuj_small_malyalam_final_pipeline_en.md new file mode 100644 index 00000000000000..fb253bf777c49b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_anuj_small_malyalam_final_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_anuj_small_malyalam_final_pipeline pipeline WhisperForCTC from Anujgr8 +author: John Snow Labs +name: whisper_anuj_small_malyalam_final_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_anuj_small_malyalam_final_pipeline` is a English model originally trained by Anujgr8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_anuj_small_malyalam_final_pipeline_en_5.5.0_3.0_1726322684153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_anuj_small_malyalam_final_pipeline_en_5.5.0_3.0_1726322684153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_anuj_small_malyalam_final_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_anuj_small_malyalam_final_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_anuj_small_malyalam_final_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Anujgr8/Whisper-Anuj-small-Malyalam-final + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_base_cantonese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_base_cantonese_pipeline_en.md new file mode 100644 index 00000000000000..db6b6ec6a60ba7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_base_cantonese_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_cantonese_pipeline pipeline WhisperForCTC from Oblivion208 +author: John Snow Labs +name: whisper_base_cantonese_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_cantonese_pipeline` is a English model originally trained by Oblivion208. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_cantonese_pipeline_en_5.5.0_3.0_1726358234595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_cantonese_pipeline_en_5.5.0_3.0_1726358234595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_cantonese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_cantonese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_cantonese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|641.6 MB| + +## References + +https://huggingface.co/Oblivion208/whisper-base-cantonese + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_base_kannada_kn.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_base_kannada_kn.md new file mode 100644 index 00000000000000..41838b02c8b1fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_base_kannada_kn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Kannada whisper_base_kannada WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_base_kannada +date: 2024-09-14 +tags: [kn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: kn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_kannada` is a Kannada model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_kannada_kn_5.5.0_3.0_1726332466891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_kannada_kn_5.5.0_3.0_1726332466891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_kannada","kn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_kannada", "kn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_kannada| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|kn| +|Size:|643.7 MB| + +## References + +https://huggingface.co/parambharat/whisper-base-kn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_base_portuguese_koliskos_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_base_portuguese_koliskos_pipeline_pt.md new file mode 100644 index 00000000000000..75b390d6d97647 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_base_portuguese_koliskos_pipeline_pt.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Portuguese whisper_base_portuguese_koliskos_pipeline pipeline WhisperForCTC from koliskos +author: John Snow Labs +name: whisper_base_portuguese_koliskos_pipeline +date: 2024-09-14 +tags: [pt, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_portuguese_koliskos_pipeline` is a Portuguese model originally trained by koliskos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_portuguese_koliskos_pipeline_pt_5.5.0_3.0_1726277860015.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_portuguese_koliskos_pipeline_pt_5.5.0_3.0_1726277860015.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_portuguese_koliskos_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_portuguese_koliskos_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_portuguese_koliskos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|643.4 MB| + +## References + +https://huggingface.co/koliskos/whisper-base-pt + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_big_kpn_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_big_kpn_en.md new file mode 100644 index 00000000000000..68566ae6b48b4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_big_kpn_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_big_kpn WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_big_kpn +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_big_kpn` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_big_kpn_en_5.5.0_3.0_1726285405619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_big_kpn_en_5.5.0_3.0_1726285405619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_big_kpn","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_big_kpn", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_big_kpn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-big-kpn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_big_kpn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_big_kpn_pipeline_en.md new file mode 100644 index 00000000000000..6edec0c92dcffb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_big_kpn_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_big_kpn_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_big_kpn_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_big_kpn_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_big_kpn_pipeline_en_5.5.0_3.0_1726285503266.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_big_kpn_pipeline_en_5.5.0_3.0_1726285503266.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_big_kpn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_big_kpn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_big_kpn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-big-kpn + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_gujarati_base_pipeline_gu.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_gujarati_base_pipeline_gu.md new file mode 100644 index 00000000000000..d4141ef6b1ef54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_gujarati_base_pipeline_gu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Gujarati whisper_gujarati_base_pipeline pipeline WhisperForCTC from vasista22 +author: John Snow Labs +name: whisper_gujarati_base_pipeline +date: 2024-09-14 +tags: [gu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: gu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_gujarati_base_pipeline` is a Gujarati model originally trained by vasista22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_gujarati_base_pipeline_gu_5.5.0_3.0_1726321509070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_gujarati_base_pipeline_gu_5.5.0_3.0_1726321509070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_gujarati_base_pipeline", lang = "gu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_gujarati_base_pipeline", lang = "gu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_gujarati_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gu| +|Size:|643.2 MB| + +## References + +https://huggingface.co/vasista22/whisper-gujarati-base + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_hf_whispy_it.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_hf_whispy_it.md new file mode 100644 index 00000000000000..6dd51c372ef4f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_hf_whispy_it.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Italian whisper_hf_whispy WhisperForCTC from whispy +author: John Snow Labs +name: whisper_hf_whispy +date: 2024-09-14 +tags: [it, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_hf_whispy` is a Italian model originally trained by whispy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_hf_whispy_it_5.5.0_3.0_1726284233047.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_hf_whispy_it_5.5.0_3.0_1726284233047.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_hf_whispy","it") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_hf_whispy", "it") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_hf_whispy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|it| +|Size:|1.7 GB| + +## References + +https://huggingface.co/whispy/whisper_hf \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_hf_whispy_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_hf_whispy_pipeline_it.md new file mode 100644 index 00000000000000..247d2f166f2363 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_hf_whispy_pipeline_it.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Italian whisper_hf_whispy_pipeline pipeline WhisperForCTC from whispy +author: John Snow Labs +name: whisper_hf_whispy_pipeline +date: 2024-09-14 +tags: [it, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_hf_whispy_pipeline` is a Italian model originally trained by whispy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_hf_whispy_pipeline_it_5.5.0_3.0_1726284309838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_hf_whispy_pipeline_it_5.5.0_3.0_1726284309838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_hf_whispy_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_hf_whispy_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_hf_whispy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|1.7 GB| + +## References + +https://huggingface.co/whispy/whisper_hf + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_medium_korean_v0_1_2_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_medium_korean_v0_1_2_pipeline_ko.md new file mode 100644 index 00000000000000..f0f3bdf7b98808 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_medium_korean_v0_1_2_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean whisper_medium_korean_v0_1_2_pipeline pipeline WhisperForCTC from morish +author: John Snow Labs +name: whisper_medium_korean_v0_1_2_pipeline +date: 2024-09-14 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_korean_v0_1_2_pipeline` is a Korean model originally trained by morish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_korean_v0_1_2_pipeline_ko_5.5.0_3.0_1726323151576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_korean_v0_1_2_pipeline_ko_5.5.0_3.0_1726323151576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_medium_korean_v0_1_2_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_medium_korean_v0_1_2_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_korean_v0_1_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|4.8 GB| + +## References + +https://huggingface.co/morish/whisper-medium-ko-v0_1_2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_medium_uzbek_extra_dataset_v2_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_medium_uzbek_extra_dataset_v2_en.md new file mode 100644 index 00000000000000..b2b0d20fa12836 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_medium_uzbek_extra_dataset_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_uzbek_extra_dataset_v2 WhisperForCTC from blackhole33 +author: John Snow Labs +name: whisper_medium_uzbek_extra_dataset_v2 +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_uzbek_extra_dataset_v2` is a English model originally trained by blackhole33. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_uzbek_extra_dataset_v2_en_5.5.0_3.0_1726333473752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_uzbek_extra_dataset_v2_en_5.5.0_3.0_1726333473752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_uzbek_extra_dataset_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_uzbek_extra_dataset_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_uzbek_extra_dataset_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/blackhole33/whisper-medium-uz-extra_dataset.v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small3_italian_it.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small3_italian_it.md new file mode 100644 index 00000000000000..90115c3efbbe15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small3_italian_it.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Italian whisper_small3_italian WhisperForCTC from luigisaetta +author: John Snow Labs +name: whisper_small3_italian +date: 2024-09-14 +tags: [it, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small3_italian` is a Italian model originally trained by luigisaetta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small3_italian_it_5.5.0_3.0_1726283662299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small3_italian_it_5.5.0_3.0_1726283662299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small3_italian","it") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small3_italian", "it") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small3_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|it| +|Size:|1.7 GB| + +## References + +https://huggingface.co/luigisaetta/whisper-small3-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cantonese_07_05_2024_2200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cantonese_07_05_2024_2200_pipeline_en.md new file mode 100644 index 00000000000000..ddcc23332b6655 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cantonese_07_05_2024_2200_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_cantonese_07_05_2024_2200_pipeline pipeline WhisperForCTC from poppysmickarlili +author: John Snow Labs +name: whisper_small_cantonese_07_05_2024_2200_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cantonese_07_05_2024_2200_pipeline` is a English model originally trained by poppysmickarlili. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cantonese_07_05_2024_2200_pipeline_en_5.5.0_3.0_1726284959643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cantonese_07_05_2024_2200_pipeline_en_5.5.0_3.0_1726284959643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_cantonese_07_05_2024_2200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_cantonese_07_05_2024_2200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cantonese_07_05_2024_2200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/poppysmickarlili/whisper-small-cantonese_07-05-2024-2200 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cebtoeng_hi.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cebtoeng_hi.md new file mode 100644 index 00000000000000..6df382af594a22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cebtoeng_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_cebtoeng WhisperForCTC from ahoka +author: John Snow Labs +name: whisper_small_cebtoeng +date: 2024-09-14 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cebtoeng` is a Hindi model originally trained by ahoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cebtoeng_hi_5.5.0_3.0_1726275670767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cebtoeng_hi_5.5.0_3.0_1726275670767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_cebtoeng","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_cebtoeng", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cebtoeng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.1 GB| + +## References + +https://huggingface.co/ahoka/whisper-small-cebToEng \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_divehi_blackbook_lab_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_divehi_blackbook_lab_pipeline_en.md new file mode 100644 index 00000000000000..8587489ea35198 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_divehi_blackbook_lab_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_divehi_blackbook_lab_pipeline pipeline WhisperForCTC from blackbook-lab +author: John Snow Labs +name: whisper_small_divehi_blackbook_lab_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_blackbook_lab_pipeline` is a English model originally trained by blackbook-lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_blackbook_lab_pipeline_en_5.5.0_3.0_1726278506763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_blackbook_lab_pipeline_en_5.5.0_3.0_1726278506763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_blackbook_lab_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_blackbook_lab_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_blackbook_lab_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/blackbook-lab/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_divehi_c14210097_dv.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_divehi_c14210097_dv.md new file mode 100644 index 00000000000000..fe9ac990c6d744 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_divehi_c14210097_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_c14210097 WhisperForCTC from c14210097 +author: John Snow Labs +name: whisper_small_divehi_c14210097 +date: 2024-09-14 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_c14210097` is a Dhivehi, Divehi, Maldivian model originally trained by c14210097. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_c14210097_dv_5.5.0_3.0_1726328567040.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_c14210097_dv_5.5.0_3.0_1726328567040.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_c14210097","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_c14210097", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_c14210097| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/c14210097/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_fine_tuned_russian_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_fine_tuned_russian_en.md new file mode 100644 index 00000000000000..d309b2752a422c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_fine_tuned_russian_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_fine_tuned_russian WhisperForCTC from artyomboyko +author: John Snow Labs +name: whisper_small_fine_tuned_russian +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_fine_tuned_russian` is a English model originally trained by artyomboyko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_fine_tuned_russian_en_5.5.0_3.0_1726279244622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_fine_tuned_russian_en_5.5.0_3.0_1726279244622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_fine_tuned_russian","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_fine_tuned_russian", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_fine_tuned_russian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/artyomboyko/whisper-small-fine_tuned-ru \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_finetuned_atco2_asr_atcosim_youngsangroh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_finetuned_atco2_asr_atcosim_youngsangroh_pipeline_en.md new file mode 100644 index 00000000000000..2e1aab08b72238 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_finetuned_atco2_asr_atcosim_youngsangroh_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_finetuned_atco2_asr_atcosim_youngsangroh_pipeline pipeline WhisperForCTC from youngsangroh +author: John Snow Labs +name: whisper_small_finetuned_atco2_asr_atcosim_youngsangroh_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_finetuned_atco2_asr_atcosim_youngsangroh_pipeline` is a English model originally trained by youngsangroh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_finetuned_atco2_asr_atcosim_youngsangroh_pipeline_en_5.5.0_3.0_1726299019763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_finetuned_atco2_asr_atcosim_youngsangroh_pipeline_en_5.5.0_3.0_1726299019763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_finetuned_atco2_asr_atcosim_youngsangroh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_finetuned_atco2_asr_atcosim_youngsangroh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_finetuned_atco2_asr_atcosim_youngsangroh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/youngsangroh/whisper-small-finetuned-atco2-asr-atcosim + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_finetuned_hindi_commonvoice_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_finetuned_hindi_commonvoice_pipeline_hi.md new file mode 100644 index 00000000000000..1c0a8a1e2c37a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_finetuned_hindi_commonvoice_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_finetuned_hindi_commonvoice_pipeline pipeline WhisperForCTC from suvrobaner +author: John Snow Labs +name: whisper_small_finetuned_hindi_commonvoice_pipeline +date: 2024-09-14 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_finetuned_hindi_commonvoice_pipeline` is a Hindi model originally trained by suvrobaner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_finetuned_hindi_commonvoice_pipeline_hi_5.5.0_3.0_1726332493627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_finetuned_hindi_commonvoice_pipeline_hi_5.5.0_3.0_1726332493627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_finetuned_hindi_commonvoice_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_finetuned_hindi_commonvoice_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_finetuned_hindi_commonvoice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/suvrobaner/whisper-small-finetuned-hi-commonvoice + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_indonesian_sandy_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_indonesian_sandy_pipeline_id.md new file mode 100644 index 00000000000000..fbbfd89355db49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_indonesian_sandy_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian whisper_small_indonesian_sandy_pipeline pipeline WhisperForCTC from Sandrian +author: John Snow Labs +name: whisper_small_indonesian_sandy_pipeline +date: 2024-09-14 +tags: [id, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_sandy_pipeline` is a Indonesian model originally trained by Sandrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_sandy_pipeline_id_5.5.0_3.0_1726275069819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_sandy_pipeline_id_5.5.0_3.0_1726275069819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_indonesian_sandy_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_indonesian_sandy_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_sandy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Sandrian/whisper-small-id-sandy + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_kannada_pipeline_kn.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_kannada_pipeline_kn.md new file mode 100644 index 00000000000000..bd38b2d20b4c11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_kannada_pipeline_kn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Kannada whisper_small_kannada_pipeline pipeline WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_small_kannada_pipeline +date: 2024-09-14 +tags: [kn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: kn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_kannada_pipeline` is a Kannada model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_kannada_pipeline_kn_5.5.0_3.0_1726285831428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_kannada_pipeline_kn_5.5.0_3.0_1726285831428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_kannada_pipeline", lang = "kn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_kannada_pipeline", lang = "kn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_kannada_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|kn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/parambharat/whisper-small-kn + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_kinyarwanda_pipeline_rw.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_kinyarwanda_pipeline_rw.md new file mode 100644 index 00000000000000..1359a26833242e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_kinyarwanda_pipeline_rw.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Kinyarwanda whisper_small_kinyarwanda_pipeline pipeline WhisperForCTC from mbazaNLP +author: John Snow Labs +name: whisper_small_kinyarwanda_pipeline +date: 2024-09-14 +tags: [rw, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: rw +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_kinyarwanda_pipeline` is a Kinyarwanda model originally trained by mbazaNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_kinyarwanda_pipeline_rw_5.5.0_3.0_1726276337686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_kinyarwanda_pipeline_rw_5.5.0_3.0_1726276337686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_kinyarwanda_pipeline", lang = "rw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_kinyarwanda_pipeline", lang = "rw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_kinyarwanda_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|rw| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mbazaNLP/Whisper-Small-Kinyarwanda + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_portuguese_1000h_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_portuguese_1000h_pipeline_en.md new file mode 100644 index 00000000000000..b8e309d44962f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_portuguese_1000h_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_portuguese_1000h_pipeline pipeline WhisperForCTC from fsicoli +author: John Snow Labs +name: whisper_small_portuguese_1000h_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_1000h_pipeline` is a English model originally trained by fsicoli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_1000h_pipeline_en_5.5.0_3.0_1726323238104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_1000h_pipeline_en_5.5.0_3.0_1726323238104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_portuguese_1000h_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_portuguese_1000h_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_1000h_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/fsicoli/whisper-small-pt-1000h + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_russian_1k_steps_ru.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_russian_1k_steps_ru.md new file mode 100644 index 00000000000000..13aa3d90276106 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_russian_1k_steps_ru.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Russian whisper_small_russian_1k_steps WhisperForCTC from sanchit-gandhi +author: John Snow Labs +name: whisper_small_russian_1k_steps +date: 2024-09-14 +tags: [ru, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_russian_1k_steps` is a Russian model originally trained by sanchit-gandhi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_russian_1k_steps_ru_5.5.0_3.0_1726329234764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_russian_1k_steps_ru_5.5.0_3.0_1726329234764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_russian_1k_steps","ru") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_russian_1k_steps", "ru") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_russian_1k_steps| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ru| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sanchit-gandhi/whisper-small-ru-1k-steps \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_russian_ord_0_9_0_1_ru.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_russian_ord_0_9_0_1_ru.md new file mode 100644 index 00000000000000..9f389594031d21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_russian_ord_0_9_0_1_ru.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Russian whisper_small_russian_ord_0_9_0_1 WhisperForCTC from mizoru +author: John Snow Labs +name: whisper_small_russian_ord_0_9_0_1 +date: 2024-09-14 +tags: [ru, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_russian_ord_0_9_0_1` is a Russian model originally trained by mizoru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_russian_ord_0_9_0_1_ru_5.5.0_3.0_1726326153217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_russian_ord_0_9_0_1_ru_5.5.0_3.0_1726326153217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_russian_ord_0_9_0_1","ru") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_russian_ord_0_9_0_1", "ru") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_russian_ord_0_9_0_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ru| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mizoru/whisper-small-ru-ORD_0.9_0.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_urdu_omar47_pipeline_ur.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_urdu_omar47_pipeline_ur.md new file mode 100644 index 00000000000000..9efb503de759e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_urdu_omar47_pipeline_ur.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Urdu whisper_small_urdu_omar47_pipeline pipeline WhisperForCTC from omar47 +author: John Snow Labs +name: whisper_small_urdu_omar47_pipeline +date: 2024-09-14 +tags: [ur, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ur +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_urdu_omar47_pipeline` is a Urdu model originally trained by omar47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_urdu_omar47_pipeline_ur_5.5.0_3.0_1726297318441.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_urdu_omar47_pipeline_ur_5.5.0_3.0_1726297318441.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_urdu_omar47_pipeline", lang = "ur") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_urdu_omar47_pipeline", lang = "ur") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_urdu_omar47_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ur| +|Size:|1.7 GB| + +## References + +https://huggingface.co/omar47/whisper-small-ur + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_synthesized_turkish_2_hour_hlr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_synthesized_turkish_2_hour_hlr_pipeline_en.md new file mode 100644 index 00000000000000..59b20e9fe70f91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_synthesized_turkish_2_hour_hlr_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_synthesized_turkish_2_hour_hlr_pipeline pipeline WhisperForCTC from alikanakar +author: John Snow Labs +name: whisper_synthesized_turkish_2_hour_hlr_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_synthesized_turkish_2_hour_hlr_pipeline` is a English model originally trained by alikanakar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_synthesized_turkish_2_hour_hlr_pipeline_en_5.5.0_3.0_1726328582146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_synthesized_turkish_2_hour_hlr_pipeline_en_5.5.0_3.0_1726328582146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_synthesized_turkish_2_hour_hlr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_synthesized_turkish_2_hour_hlr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_synthesized_turkish_2_hour_hlr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/alikanakar/whisper-synthesized-turkish-2-hour-hlr + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_chinese_cn_lr4_b16_zh.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_chinese_cn_lr4_b16_zh.md new file mode 100644 index 00000000000000..ee30c5542e538e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_chinese_cn_lr4_b16_zh.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Chinese whisper_tiny_chinese_cn_lr4_b16 WhisperForCTC from VingeNie +author: John Snow Labs +name: whisper_tiny_chinese_cn_lr4_b16 +date: 2024-09-14 +tags: [zh, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_chinese_cn_lr4_b16` is a Chinese model originally trained by VingeNie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_cn_lr4_b16_zh_5.5.0_3.0_1726299141652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_cn_lr4_b16_zh_5.5.0_3.0_1726299141652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_chinese_cn_lr4_b16","zh") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_chinese_cn_lr4_b16", "zh") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_chinese_cn_lr4_b16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|zh| +|Size:|389.1 MB| + +## References + +https://huggingface.co/VingeNie/whisper-tiny-zh_CN_lr4_b16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_divehi_paavni_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_divehi_paavni_en.md new file mode 100644 index 00000000000000..caae9b35678645 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_divehi_paavni_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_divehi_paavni WhisperForCTC from paavni +author: John Snow Labs +name: whisper_tiny_divehi_paavni +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_divehi_paavni` is a English model originally trained by paavni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_paavni_en_5.5.0_3.0_1726325207977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_paavni_en_5.5.0_3.0_1726325207977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_divehi_paavni","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_divehi_paavni", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_divehi_paavni| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/paavni/whisper-tiny-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_oshiwambo_speech_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_oshiwambo_speech_en.md new file mode 100644 index 00000000000000..40e32b6be0a74d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_oshiwambo_speech_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_oshiwambo_speech WhisperForCTC from jpandeinge +author: John Snow Labs +name: whisper_tiny_oshiwambo_speech +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_oshiwambo_speech` is a English model originally trained by jpandeinge. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_oshiwambo_speech_en_5.5.0_3.0_1726325032931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_oshiwambo_speech_en_5.5.0_3.0_1726325032931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_oshiwambo_speech","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_oshiwambo_speech", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_oshiwambo_speech| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.8 MB| + +## References + +https://huggingface.co/jpandeinge/whisper-tiny-oshiwambo-speech \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-wikitext_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-wikitext_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..8deda835c40e49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-wikitext_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English wikitext_roberta_base_pipeline pipeline RoBertaEmbeddings from gary109 +author: John Snow Labs +name: wikitext_roberta_base_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wikitext_roberta_base_pipeline` is a English model originally trained by gary109. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wikitext_roberta_base_pipeline_en_5.5.0_3.0_1726300195373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wikitext_roberta_base_pipeline_en_5.5.0_3.0_1726300195373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wikitext_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wikitext_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wikitext_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/gary109/wikitext_roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_french_kata958_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_french_kata958_pipeline_en.md new file mode 100644 index 00000000000000..e5e18e3c414987 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_french_kata958_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_kata958_pipeline pipeline XlmRoBertaForTokenClassification from kata958 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_kata958_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_kata958_pipeline` is a English model originally trained by kata958. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kata958_pipeline_en_5.5.0_3.0_1726290266719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kata958_pipeline_en_5.5.0_3.0_1726290266719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_kata958_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_kata958_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_kata958_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/kata958/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_german_ysige_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_german_ysige_pipeline_en.md new file mode 100644 index 00000000000000..5cce775c35b2be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_german_ysige_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ysige_pipeline pipeline XlmRoBertaForTokenClassification from ysige +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ysige_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ysige_pipeline` is a English model originally trained by ysige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ysige_pipeline_en_5.5.0_3.0_1726345541764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ysige_pipeline_en_5.5.0_3.0_1726345541764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ysige_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ysige_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ysige_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ysige/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_twitter_indonesia_sarcastic_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_twitter_indonesia_sarcastic_en.md new file mode 100644 index 00000000000000..98135d3dd9aa66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_twitter_indonesia_sarcastic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_twitter_indonesia_sarcastic XlmRoBertaForSequenceClassification from w11wo +author: John Snow Labs +name: xlm_roberta_base_twitter_indonesia_sarcastic +date: 2024-09-14 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_twitter_indonesia_sarcastic` is a English model originally trained by w11wo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_twitter_indonesia_sarcastic_en_5.5.0_3.0_1726317287314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_twitter_indonesia_sarcastic_en_5.5.0_3.0_1726317287314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_twitter_indonesia_sarcastic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_twitter_indonesia_sarcastic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_twitter_indonesia_sarcastic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|808.5 MB| + +## References + +https://huggingface.co/w11wo/xlm-roberta-base-twitter-indonesia-sarcastic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-4_datasets_fake_news_with_balanced_with_raw_data_en.md b/docs/_posts/ahmedlone127/2024-09-15-4_datasets_fake_news_with_balanced_with_raw_data_en.md new file mode 100644 index 00000000000000..55cdd23746353a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-4_datasets_fake_news_with_balanced_with_raw_data_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 4_datasets_fake_news_with_balanced_with_raw_data DistilBertForSequenceClassification from littlepinhorse +author: John Snow Labs +name: 4_datasets_fake_news_with_balanced_with_raw_data +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`4_datasets_fake_news_with_balanced_with_raw_data` is a English model originally trained by littlepinhorse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/4_datasets_fake_news_with_balanced_with_raw_data_en_5.5.0_3.0_1726385011004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/4_datasets_fake_news_with_balanced_with_raw_data_en_5.5.0_3.0_1726385011004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("4_datasets_fake_news_with_balanced_with_raw_data","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("4_datasets_fake_news_with_balanced_with_raw_data", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|4_datasets_fake_news_with_balanced_with_raw_data| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/littlepinhorse/4_datasets_fake_news_with_Balanced_With_Raw_Data \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-7eg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-7eg_pipeline_en.md new file mode 100644 index 00000000000000..caeaa9cb77c8e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-7eg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 7eg_pipeline pipeline RoBertaForSequenceClassification from aloxatel +author: John Snow Labs +name: 7eg_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`7eg_pipeline` is a English model originally trained by aloxatel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/7eg_pipeline_en_5.5.0_3.0_1726402231817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/7eg_pipeline_en_5.5.0_3.0_1726402231817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("7eg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("7eg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|7eg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/aloxatel/7EG + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-acrossapps_ndd_dimeshift_test_content_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-acrossapps_ndd_dimeshift_test_content_pipeline_en.md new file mode 100644 index 00000000000000..af016b918a5b8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-acrossapps_ndd_dimeshift_test_content_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English acrossapps_ndd_dimeshift_test_content_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: acrossapps_ndd_dimeshift_test_content_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`acrossapps_ndd_dimeshift_test_content_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/acrossapps_ndd_dimeshift_test_content_pipeline_en_5.5.0_3.0_1726393560623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/acrossapps_ndd_dimeshift_test_content_pipeline_en_5.5.0_3.0_1726393560623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("acrossapps_ndd_dimeshift_test_content_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("acrossapps_ndd_dimeshift_test_content_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|acrossapps_ndd_dimeshift_test_content_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/ACROSSAPPS_NDD-dimeshift_test-content + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-albert_large_v2weighted_hoax_classifier_definition_en.md b/docs/_posts/ahmedlone127/2024-09-15-albert_large_v2weighted_hoax_classifier_definition_en.md new file mode 100644 index 00000000000000..b24fbf48bbbc39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-albert_large_v2weighted_hoax_classifier_definition_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_large_v2weighted_hoax_classifier_definition AlbertForSequenceClassification from research-dump +author: John Snow Labs +name: albert_large_v2weighted_hoax_classifier_definition +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_large_v2weighted_hoax_classifier_definition` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_large_v2weighted_hoax_classifier_definition_en_5.5.0_3.0_1726395860869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_large_v2weighted_hoax_classifier_definition_en_5.5.0_3.0_1726395860869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_large_v2weighted_hoax_classifier_definition","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_large_v2weighted_hoax_classifier_definition", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_large_v2weighted_hoax_classifier_definition| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|66.7 MB| + +## References + +https://huggingface.co/research-dump/albert-large-v2weighted_hoax_classifier_definition \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-amazonbaby5000_emotional_check_38sebastian_en.md b/docs/_posts/ahmedlone127/2024-09-15-amazonbaby5000_emotional_check_38sebastian_en.md new file mode 100644 index 00000000000000..29326dd24b843d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-amazonbaby5000_emotional_check_38sebastian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amazonbaby5000_emotional_check_38sebastian DistilBertForSequenceClassification from 38sebastian +author: John Snow Labs +name: amazonbaby5000_emotional_check_38sebastian +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazonbaby5000_emotional_check_38sebastian` is a English model originally trained by 38sebastian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazonbaby5000_emotional_check_38sebastian_en_5.5.0_3.0_1726393867121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazonbaby5000_emotional_check_38sebastian_en_5.5.0_3.0_1726393867121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("amazonbaby5000_emotional_check_38sebastian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("amazonbaby5000_emotional_check_38sebastian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazonbaby5000_emotional_check_38sebastian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/38sebastian/amazonbaby5000_emotional_check \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-amazonbaby_5000_jxptlr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-amazonbaby_5000_jxptlr_pipeline_en.md new file mode 100644 index 00000000000000..51af27133746df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-amazonbaby_5000_jxptlr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amazonbaby_5000_jxptlr_pipeline pipeline DistilBertForSequenceClassification from Jxptlr +author: John Snow Labs +name: amazonbaby_5000_jxptlr_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazonbaby_5000_jxptlr_pipeline` is a English model originally trained by Jxptlr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazonbaby_5000_jxptlr_pipeline_en_5.5.0_3.0_1726406573847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazonbaby_5000_jxptlr_pipeline_en_5.5.0_3.0_1726406573847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amazonbaby_5000_jxptlr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amazonbaby_5000_jxptlr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazonbaby_5000_jxptlr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jxptlr/amazonbaby-5000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-babyberta_aochildes_2_5m_aochildes_french_with_masking_seed6_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-15-babyberta_aochildes_2_5m_aochildes_french_with_masking_seed6_finetuned_squad_en.md new file mode 100644 index 00000000000000..078fef34006a95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-babyberta_aochildes_2_5m_aochildes_french_with_masking_seed6_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_aochildes_2_5m_aochildes_french_with_masking_seed6_finetuned_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_2_5m_aochildes_french_with_masking_seed6_finetuned_squad +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_2_5m_aochildes_french_with_masking_seed6_finetuned_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_aochildes_french_with_masking_seed6_finetuned_squad_en_5.5.0_3.0_1726364080794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_aochildes_french_with_masking_seed6_finetuned_squad_en_5.5.0_3.0_1726364080794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_2_5m_aochildes_french_with_masking_seed6_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_2_5m_aochildes_french_with_masking_seed6_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_2_5m_aochildes_french_with_masking_seed6_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes_2.5M_aochildes-french-with-Masking-seed6-finetuned-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bert_base_uncased_itr1_en.md b/docs/_posts/ahmedlone127/2024-09-15-bert_base_uncased_itr1_en.md new file mode 100644 index 00000000000000..3e387fe1b3d02b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bert_base_uncased_itr1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_itr1 DistilBertForSequenceClassification from MeanBean-05 +author: John Snow Labs +name: bert_base_uncased_itr1 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_itr1` is a English model originally trained by MeanBean-05. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_itr1_en_5.5.0_3.0_1726393890099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_itr1_en_5.5.0_3.0_1726393890099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_itr1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_itr1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_itr1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MeanBean-05/bert-base-uncased-itr1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bert_finetuned_squad_accelerate22_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-bert_finetuned_squad_accelerate22_pipeline_en.md new file mode 100644 index 00000000000000..8085c6f9ce498f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bert_finetuned_squad_accelerate22_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_squad_accelerate22_pipeline pipeline RoBertaForQuestionAnswering from makdong +author: John Snow Labs +name: bert_finetuned_squad_accelerate22_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_accelerate22_pipeline` is a English model originally trained by makdong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_accelerate22_pipeline_en_5.5.0_3.0_1726369263653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_accelerate22_pipeline_en_5.5.0_3.0_1726369263653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_squad_accelerate22_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_squad_accelerate22_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_accelerate22_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|461.9 MB| + +## References + +https://huggingface.co/makdong/bert-finetuned-squad-accelerate22 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bert_one_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-bert_one_pipeline_en.md new file mode 100644 index 00000000000000..eae9d6ee82c79b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bert_one_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_one_pipeline pipeline BertEmbeddings from emma7897 +author: John Snow Labs +name: bert_one_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_one_pipeline` is a English model originally trained by emma7897. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_one_pipeline_en_5.5.0_3.0_1726399280924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_one_pipeline_en_5.5.0_3.0_1726399280924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_one_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_one_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_one_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/emma7897/bert_one + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bert_vllm_gemma2b_stringmatcher_newdataset_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-bert_vllm_gemma2b_stringmatcher_newdataset_4_pipeline_en.md new file mode 100644 index 00000000000000..767f54c25fffb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bert_vllm_gemma2b_stringmatcher_newdataset_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_vllm_gemma2b_stringmatcher_newdataset_4_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_stringmatcher_newdataset_4_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_stringmatcher_newdataset_4_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_stringmatcher_newdataset_4_pipeline_en_5.5.0_3.0_1726385009620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_stringmatcher_newdataset_4_pipeline_en_5.5.0_3.0_1726385009620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_vllm_gemma2b_stringmatcher_newdataset_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_vllm_gemma2b_stringmatcher_newdataset_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_stringmatcher_newdataset_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-stringMatcher-newDataset_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_model_andrew6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_model_andrew6_pipeline_en.md new file mode 100644 index 00000000000000..c73b081c995d18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_model_andrew6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_andrew6_pipeline pipeline DistilBertForSequenceClassification from andrew6 +author: John Snow Labs +name: burmese_awesome_model_andrew6_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_andrew6_pipeline` is a English model originally trained by andrew6. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_andrew6_pipeline_en_5.5.0_3.0_1726394198138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_andrew6_pipeline_en_5.5.0_3.0_1726394198138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_andrew6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_andrew6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_andrew6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andrew6/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_hark99_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_hark99_en.md new file mode 100644 index 00000000000000..a51fd38077cf11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_hark99_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_hark99 DistilBertForQuestionAnswering from hark99 +author: John Snow Labs +name: burmese_awesome_qa_model_hark99 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_hark99` is a English model originally trained by hark99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_hark99_en_5.5.0_3.0_1726435176192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_hark99_en_5.5.0_3.0_1726435176192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_hark99","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_hark99", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_hark99| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/hark99/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_jibuaji_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_jibuaji_pipeline_en.md new file mode 100644 index 00000000000000..292f3d53675db3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_jibuaji_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_jibuaji_pipeline pipeline DistilBertForQuestionAnswering from JibuAji +author: John Snow Labs +name: burmese_awesome_qa_model_jibuaji_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_jibuaji_pipeline` is a English model originally trained by JibuAji. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_jibuaji_pipeline_en_5.5.0_3.0_1726382693790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_jibuaji_pipeline_en_5.5.0_3.0_1726382693790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_jibuaji_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_jibuaji_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_jibuaji_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/JibuAji/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_ranamahdavi_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_ranamahdavi_en.md new file mode 100644 index 00000000000000..81368a4bf964fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_ranamahdavi_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_ranamahdavi DistilBertForQuestionAnswering from ranamahdavi +author: John Snow Labs +name: burmese_awesome_qa_model_ranamahdavi +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_ranamahdavi` is a English model originally trained by ranamahdavi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ranamahdavi_en_5.5.0_3.0_1726435357176.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ranamahdavi_en_5.5.0_3.0_1726435357176.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_ranamahdavi","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_ranamahdavi", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_ranamahdavi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/ranamahdavi/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-codebert_base_mlm_microsoft_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-codebert_base_mlm_microsoft_pipeline_en.md new file mode 100644 index 00000000000000..f86d84db09890b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-codebert_base_mlm_microsoft_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English codebert_base_mlm_microsoft_pipeline pipeline RoBertaEmbeddings from microsoft +author: John Snow Labs +name: codebert_base_mlm_microsoft_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`codebert_base_mlm_microsoft_pipeline` is a English model originally trained by microsoft. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/codebert_base_mlm_microsoft_pipeline_en_5.5.0_3.0_1726383973898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/codebert_base_mlm_microsoft_pipeline_en_5.5.0_3.0_1726383973898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("codebert_base_mlm_microsoft_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("codebert_base_mlm_microsoft_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|codebert_base_mlm_microsoft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/microsoft/codebert-base-mlm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-cuad_distil_governing_law_cased_08_31_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-cuad_distil_governing_law_cased_08_31_v1_pipeline_en.md new file mode 100644 index 00000000000000..052d1314a90831 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-cuad_distil_governing_law_cased_08_31_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English cuad_distil_governing_law_cased_08_31_v1_pipeline pipeline DistilBertForQuestionAnswering from saraks +author: John Snow Labs +name: cuad_distil_governing_law_cased_08_31_v1_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cuad_distil_governing_law_cased_08_31_v1_pipeline` is a English model originally trained by saraks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cuad_distil_governing_law_cased_08_31_v1_pipeline_en_5.5.0_3.0_1726382722606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cuad_distil_governing_law_cased_08_31_v1_pipeline_en_5.5.0_3.0_1726382722606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cuad_distil_governing_law_cased_08_31_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cuad_distil_governing_law_cased_08_31_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cuad_distil_governing_law_cased_08_31_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/saraks/cuad-distil-governing_law-cased-08-31-v1 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-cuatr_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-15-cuatr_distilbert_en.md new file mode 100644 index 00000000000000..127f8f85ba73c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-cuatr_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cuatr_distilbert DistilBertForSequenceClassification from chathuru +author: John Snow Labs +name: cuatr_distilbert +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cuatr_distilbert` is a English model originally trained by chathuru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cuatr_distilbert_en_5.5.0_3.0_1726394170741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cuatr_distilbert_en_5.5.0_3.0_1726394170741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("cuatr_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("cuatr_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cuatr_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chathuru/CuATR-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-custom_peft_whiper_small_korean_v3_en.md b/docs/_posts/ahmedlone127/2024-09-15-custom_peft_whiper_small_korean_v3_en.md new file mode 100644 index 00000000000000..1d5c7aba10fd9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-custom_peft_whiper_small_korean_v3_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English custom_peft_whiper_small_korean_v3 WhisperForCTC from devkya +author: John Snow Labs +name: custom_peft_whiper_small_korean_v3 +date: 2024-09-15 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`custom_peft_whiper_small_korean_v3` is a English model originally trained by devkya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/custom_peft_whiper_small_korean_v3_en_5.5.0_3.0_1726427461376.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/custom_peft_whiper_small_korean_v3_en_5.5.0_3.0_1726427461376.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("custom_peft_whiper_small_korean_v3","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("custom_peft_whiper_small_korean_v3", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|custom_peft_whiper_small_korean_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/devkya/custom-peft-whiper-small-ko-v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-db_mc2_3_3_en.md b/docs/_posts/ahmedlone127/2024-09-15-db_mc2_3_3_en.md new file mode 100644 index 00000000000000..544a2601ee3bff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-db_mc2_3_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English db_mc2_3_3 DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_mc2_3_3 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_mc2_3_3` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_mc2_3_3_en_5.5.0_3.0_1726366096965.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_mc2_3_3_en_5.5.0_3.0_1726366096965.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("db_mc2_3_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("db_mc2_3_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_mc2_3_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/exala/db_mc2_3.3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-dfm_ed2_en.md b/docs/_posts/ahmedlone127/2024-09-15-dfm_ed2_en.md new file mode 100644 index 00000000000000..cf3d3700a2311a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-dfm_ed2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dfm_ed2 BertForSequenceClassification from yemen2016 +author: John Snow Labs +name: dfm_ed2 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dfm_ed2` is a English model originally trained by yemen2016. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dfm_ed2_en_5.5.0_3.0_1726378497208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dfm_ed2_en_5.5.0_3.0_1726378497208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("dfm_ed2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("dfm_ed2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dfm_ed2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/yemen2016/dfm_ED2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-dfm_ed2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-dfm_ed2_pipeline_en.md new file mode 100644 index 00000000000000..c6334a30b3e4ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-dfm_ed2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dfm_ed2_pipeline pipeline BertForSequenceClassification from yemen2016 +author: John Snow Labs +name: dfm_ed2_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dfm_ed2_pipeline` is a English model originally trained by yemen2016. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dfm_ed2_pipeline_en_5.5.0_3.0_1726378559685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dfm_ed2_pipeline_en_5.5.0_3.0_1726378559685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dfm_ed2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dfm_ed2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dfm_ed2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/yemen2016/dfm_ED2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_multilingual_cased_jaquad_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_multilingual_cased_jaquad_pipeline_xx.md new file mode 100644 index 00000000000000..c5f41da648cc79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_multilingual_cased_jaquad_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_jaquad_pipeline pipeline DistilBertForQuestionAnswering from cuongtk2002 +author: John Snow Labs +name: distilbert_base_multilingual_cased_jaquad_pipeline +date: 2024-09-15 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_jaquad_pipeline` is a Multilingual model originally trained by cuongtk2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_jaquad_pipeline_xx_5.5.0_3.0_1726382538372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_jaquad_pipeline_xx_5.5.0_3.0_1726382538372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_jaquad_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_jaquad_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_jaquad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|505.4 MB| + +## References + +https://huggingface.co/cuongtk2002/distilbert-base-multilingual-cased-JaQuAD + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_spanish_uncased_finetuned_squad_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_spanish_uncased_finetuned_squad_spanish_en.md new file mode 100644 index 00000000000000..aa463a58e92dec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_spanish_uncased_finetuned_squad_spanish_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_spanish_uncased_finetuned_squad_spanish DistilBertForQuestionAnswering from piturrolfio +author: John Snow Labs +name: distilbert_base_spanish_uncased_finetuned_squad_spanish +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_spanish_uncased_finetuned_squad_spanish` is a English model originally trained by piturrolfio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_spanish_uncased_finetuned_squad_spanish_en_5.5.0_3.0_1726435298852.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_spanish_uncased_finetuned_squad_spanish_en_5.5.0_3.0_1726435298852.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_spanish_uncased_finetuned_squad_spanish","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_spanish_uncased_finetuned_squad_spanish", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_spanish_uncased_finetuned_squad_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|250.2 MB| + +## References + +https://huggingface.co/piturrolfio/distilbert-base-spanish-uncased-finetuned-squad-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_emotion_ft_0416_chris2me_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_emotion_ft_0416_chris2me_en.md new file mode 100644 index 00000000000000..68cb7cd3beccd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_emotion_ft_0416_chris2me_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_ft_0416_chris2me DistilBertForSequenceClassification from Chris2me +author: John Snow Labs +name: distilbert_base_uncased_emotion_ft_0416_chris2me +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_ft_0416_chris2me` is a English model originally trained by Chris2me. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_chris2me_en_5.5.0_3.0_1726393997126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_chris2me_en_5.5.0_3.0_1726393997126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_ft_0416_chris2me","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_ft_0416_chris2me", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_ft_0416_chris2me| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Chris2me/distilbert-base-uncased_emotion_ft_0416 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_clinc_esperesa_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_clinc_esperesa_en.md new file mode 100644 index 00000000000000..67309b994265f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_clinc_esperesa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_esperesa DistilBertForSequenceClassification from esperesa +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_esperesa +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_esperesa` is a English model originally trained by esperesa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_esperesa_en_5.5.0_3.0_1726366116229.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_esperesa_en_5.5.0_3.0_1726366116229.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_esperesa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_esperesa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_esperesa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/esperesa/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_deionk_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_deionk_en.md new file mode 100644 index 00000000000000..68f9c99654068c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_deionk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_deionk DistilBertForSequenceClassification from deionk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_deionk +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_deionk` is a English model originally trained by deionk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_deionk_en_5.5.0_3.0_1726366387587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_deionk_en_5.5.0_3.0_1726366387587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_deionk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_deionk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_deionk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/deionk/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_deionk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_deionk_pipeline_en.md new file mode 100644 index 00000000000000..0696d4fef2eb89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_deionk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_deionk_pipeline pipeline DistilBertForSequenceClassification from deionk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_deionk_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_deionk_pipeline` is a English model originally trained by deionk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_deionk_pipeline_en_5.5.0_3.0_1726366399710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_deionk_pipeline_en_5.5.0_3.0_1726366399710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_deionk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_deionk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_deionk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/deionk/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_maydogdu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_maydogdu_pipeline_en.md new file mode 100644 index 00000000000000..2e8b7081cc2a29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_maydogdu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_maydogdu_pipeline pipeline DistilBertForSequenceClassification from maydogdu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_maydogdu_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_maydogdu_pipeline` is a English model originally trained by maydogdu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_maydogdu_pipeline_en_5.5.0_3.0_1726366124178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_maydogdu_pipeline_en_5.5.0_3.0_1726366124178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_maydogdu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_maydogdu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_maydogdu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/maydogdu/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_tagaway_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_tagaway_pipeline_en.md new file mode 100644 index 00000000000000..10754e93b89ef2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_tagaway_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_tagaway_pipeline pipeline DistilBertForSequenceClassification from tagaway +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_tagaway_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_tagaway_pipeline` is a English model originally trained by tagaway. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_tagaway_pipeline_en_5.5.0_3.0_1726393787971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_tagaway_pipeline_en_5.5.0_3.0_1726393787971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_tagaway_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_tagaway_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_tagaway_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tagaway/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_thomnis_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_thomnis_en.md new file mode 100644 index 00000000000000..bccbf3baa7cdd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_thomnis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_thomnis DistilBertForSequenceClassification from thomnis +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_thomnis +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_thomnis` is a English model originally trained by thomnis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thomnis_en_5.5.0_3.0_1726366098677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thomnis_en_5.5.0_3.0_1726366098677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_thomnis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_thomnis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_thomnis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thomnis/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emrqa_msquad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emrqa_msquad_pipeline_en.md new file mode 100644 index 00000000000000..2903070667e82f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emrqa_msquad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emrqa_msquad_pipeline pipeline DistilBertForQuestionAnswering from wieheistdu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emrqa_msquad_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emrqa_msquad_pipeline` is a English model originally trained by wieheistdu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emrqa_msquad_pipeline_en_5.5.0_3.0_1726382400882.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emrqa_msquad_pipeline_en_5.5.0_3.0_1726382400882.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emrqa_msquad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emrqa_msquad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emrqa_msquad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/wieheistdu/distilbert-base-uncased-finetuned-emrQA-msquad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_gogo0920_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_gogo0920_pipeline_en.md new file mode 100644 index 00000000000000..576796d4562197 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_gogo0920_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_gogo0920_pipeline pipeline DistilBertForQuestionAnswering from gogo0920 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_gogo0920_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_gogo0920_pipeline` is a English model originally trained by gogo0920. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_gogo0920_pipeline_en_5.5.0_3.0_1726434998362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_gogo0920_pipeline_en_5.5.0_3.0_1726434998362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_gogo0920_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_gogo0920_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_gogo0920_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/gogo0920/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_minji222_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_minji222_en.md new file mode 100644 index 00000000000000..d6a3c79b9c6785 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_minji222_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_minji222 DistilBertForQuestionAnswering from minji222 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_minji222 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_minji222` is a English model originally trained by minji222. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_minji222_en_5.5.0_3.0_1726382538296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_minji222_en_5.5.0_3.0_1726382538296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_minji222","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_minji222", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_minji222| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/minji222/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_sasi1565_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_sasi1565_en.md new file mode 100644 index 00000000000000..1fab2574ef3bb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_sasi1565_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_sasi1565 DistilBertForQuestionAnswering from Sasi1565 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_sasi1565 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_sasi1565` is a English model originally trained by Sasi1565. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_sasi1565_en_5.5.0_3.0_1726435322404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_sasi1565_en_5.5.0_3.0_1726435322404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_sasi1565","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_sasi1565", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_sasi1565| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Sasi1565/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_sasi1565_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_sasi1565_pipeline_en.md new file mode 100644 index 00000000000000..ed8546cc5f9205 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_sasi1565_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_sasi1565_pipeline pipeline DistilBertForQuestionAnswering from Sasi1565 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_sasi1565_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_sasi1565_pipeline` is a English model originally trained by Sasi1565. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_sasi1565_pipeline_en_5.5.0_3.0_1726435334503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_sasi1565_pipeline_en_5.5.0_3.0_1726435334503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_sasi1565_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_sasi1565_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_sasi1565_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Sasi1565/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_wendywangwww_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_wendywangwww_en.md new file mode 100644 index 00000000000000..635a1770196ad3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_wendywangwww_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_wendywangwww DistilBertForQuestionAnswering from wendywangwww +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_wendywangwww +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_wendywangwww` is a English model originally trained by wendywangwww. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_wendywangwww_en_5.5.0_3.0_1726435072168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_wendywangwww_en_5.5.0_3.0_1726435072168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_wendywangwww","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_wendywangwww", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_wendywangwww| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/wendywangwww/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_wendywangwww_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_wendywangwww_pipeline_en.md new file mode 100644 index 00000000000000..fc981f1e219cfa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_wendywangwww_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_wendywangwww_pipeline pipeline DistilBertForQuestionAnswering from wendywangwww +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_wendywangwww_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_wendywangwww_pipeline` is a English model originally trained by wendywangwww. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_wendywangwww_pipeline_en_5.5.0_3.0_1726435084146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_wendywangwww_pipeline_en_5.5.0_3.0_1726435084146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_wendywangwww_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_wendywangwww_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_wendywangwww_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/wendywangwww/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_pipeline_en.md new file mode 100644 index 00000000000000..bbbff7eb7065c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_pipeline_en_5.5.0_3.0_1726406032186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_pipeline_en_5.5.0_3.0_1726406032186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut1_PLPrefix0stlarge2_simsp400_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st4sd_ut72ut5_plprefix0stlarge4_simsp100_clean300_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st4sd_ut72ut5_plprefix0stlarge4_simsp100_clean300_en.md new file mode 100644 index 00000000000000..3dfce27f603faf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st4sd_ut72ut5_plprefix0stlarge4_simsp100_clean300_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st4sd_ut72ut5_plprefix0stlarge4_simsp100_clean300 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st4sd_ut72ut5_plprefix0stlarge4_simsp100_clean300 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st4sd_ut72ut5_plprefix0stlarge4_simsp100_clean300` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st4sd_ut72ut5_plprefix0stlarge4_simsp100_clean300_en_5.5.0_3.0_1726405908442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st4sd_ut72ut5_plprefix0stlarge4_simsp100_clean300_en_5.5.0_3.0_1726405908442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st4sd_ut72ut5_plprefix0stlarge4_simsp100_clean300","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st4sd_ut72ut5_plprefix0stlarge4_simsp100_clean300", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st4sd_ut72ut5_plprefix0stlarge4_simsp100_clean300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st4sd_ut72ut5_PLPrefix0stlarge4_simsp100_clean300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st5sd_ut52ut1_plprefix0stlarge5_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st5sd_ut52ut1_plprefix0stlarge5_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..69938730180e55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st5sd_ut52ut1_plprefix0stlarge5_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut52ut1_plprefix0stlarge5_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut52ut1_plprefix0stlarge5_simsp100_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut52ut1_plprefix0stlarge5_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut52ut1_plprefix0stlarge5_simsp100_pipeline_en_5.5.0_3.0_1726365729373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut52ut1_plprefix0stlarge5_simsp100_pipeline_en_5.5.0_3.0_1726365729373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut52ut1_plprefix0stlarge5_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut52ut1_plprefix0stlarge5_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut52ut1_plprefix0stlarge5_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut52ut1_PLPrefix0stlarge5_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_squad2_p90_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_squad2_p90_en.md new file mode 100644 index 00000000000000..f3facac8c94d08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_squad2_p90_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p90 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p90 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p90` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p90_en_5.5.0_3.0_1726435390343.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p90_en_5.5.0_3.0_1726435390343.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p90","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p90", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p90| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|121.9 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p90 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_squad2_p90_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_squad2_p90_pipeline_en.md new file mode 100644 index 00000000000000..3e6eb5de1936e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_squad2_p90_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p90_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p90_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p90_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p90_pipeline_en_5.5.0_3.0_1726435402557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p90_pipeline_en_5.5.0_3.0_1726435402557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_p90_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_p90_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p90_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|121.9 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p90 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_emotion_eserdy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_emotion_eserdy_pipeline_en.md new file mode 100644 index 00000000000000..f14efa3fd67fb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_emotion_eserdy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_eserdy_pipeline pipeline DistilBertForSequenceClassification from eserdy +author: John Snow Labs +name: distilbert_emotion_eserdy_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_eserdy_pipeline` is a English model originally trained by eserdy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_eserdy_pipeline_en_5.5.0_3.0_1726406577743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_eserdy_pipeline_en_5.5.0_3.0_1726406577743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_eserdy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_eserdy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_eserdy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eserdy/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_finance_future_amounts_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finance_future_amounts_pipeline_en.md new file mode 100644 index 00000000000000..c2c65d68199425 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finance_future_amounts_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finance_future_amounts_pipeline pipeline DistilBertForSequenceClassification from finsynth +author: John Snow Labs +name: distilbert_finance_future_amounts_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finance_future_amounts_pipeline` is a English model originally trained by finsynth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finance_future_amounts_pipeline_en_5.5.0_3.0_1726365956985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finance_future_amounts_pipeline_en_5.5.0_3.0_1726365956985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finance_future_amounts_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finance_future_amounts_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finance_future_amounts_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/finsynth/distilbert-finance-future-amounts + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_custom_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_custom_en.md new file mode 100644 index 00000000000000..3ff87e614a5ee6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_custom_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_finetuned_custom DistilBertForQuestionAnswering from Naima12 +author: John Snow Labs +name: distilbert_finetuned_custom +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_custom` is a English model originally trained by Naima12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_custom_en_5.5.0_3.0_1726435158440.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_custom_en_5.5.0_3.0_1726435158440.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_custom","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_custom", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_custom| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/Naima12/distilbert-finetuned-custom \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_custom_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_custom_pipeline_en.md new file mode 100644 index 00000000000000..05fd0f5751b216 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_custom_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_finetuned_custom_pipeline pipeline DistilBertForQuestionAnswering from Naima12 +author: John Snow Labs +name: distilbert_finetuned_custom_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_custom_pipeline` is a English model originally trained by Naima12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_custom_pipeline_en_5.5.0_3.0_1726435170486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_custom_pipeline_en_5.5.0_3.0_1726435170486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_custom_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_custom_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_custom_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/Naima12/distilbert-finetuned-custom + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_model_01_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_model_01_en.md new file mode 100644 index 00000000000000..ea5d5193ad35d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_model_01_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_finetuned_model_01 DistilBertForQuestionAnswering from Naima12 +author: John Snow Labs +name: distilbert_finetuned_model_01 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_model_01` is a English model originally trained by Naima12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_model_01_en_5.5.0_3.0_1726435442393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_model_01_en_5.5.0_3.0_1726435442393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_model_01","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_model_01", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_model_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/Naima12/distilbert-finetuned-Model_01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_model_01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_model_01_pipeline_en.md new file mode 100644 index 00000000000000..a8f32cea2bd994 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_model_01_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_finetuned_model_01_pipeline pipeline DistilBertForQuestionAnswering from Naima12 +author: John Snow Labs +name: distilbert_finetuned_model_01_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_model_01_pipeline` is a English model originally trained by Naima12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_model_01_pipeline_en_5.5.0_3.0_1726435454548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_model_01_pipeline_en_5.5.0_3.0_1726435454548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_model_01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_model_01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_model_01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/Naima12/distilbert-finetuned-Model_01 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_squadv2_mf212_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_squadv2_mf212_en.md new file mode 100644 index 00000000000000..340faa6c3c6d2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_squadv2_mf212_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_finetuned_squadv2_mf212 DistilBertForQuestionAnswering from mf212 +author: John Snow Labs +name: distilbert_finetuned_squadv2_mf212 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squadv2_mf212` is a English model originally trained by mf212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_mf212_en_5.5.0_3.0_1726435452082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_mf212_en_5.5.0_3.0_1726435452082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_squadv2_mf212","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_squadv2_mf212", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squadv2_mf212| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/mf212/distilbert-finetuned-squadv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_squadv2_mf212_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_squadv2_mf212_pipeline_en.md new file mode 100644 index 00000000000000..feef919058d9ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_squadv2_mf212_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_finetuned_squadv2_mf212_pipeline pipeline DistilBertForQuestionAnswering from mf212 +author: John Snow Labs +name: distilbert_finetuned_squadv2_mf212_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squadv2_mf212_pipeline` is a English model originally trained by mf212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_mf212_pipeline_en_5.5.0_3.0_1726435464342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_mf212_pipeline_en_5.5.0_3.0_1726435464342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_squadv2_mf212_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_squadv2_mf212_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squadv2_mf212_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/mf212/distilbert-finetuned-squadv2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_sqv2_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_sqv2_en.md new file mode 100644 index 00000000000000..aa175b1a9a9fdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_sqv2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_finetuned_sqv2 DistilBertForQuestionAnswering from Huyisbeee +author: John Snow Labs +name: distilbert_finetuned_sqv2 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_sqv2` is a English model originally trained by Huyisbeee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_sqv2_en_5.5.0_3.0_1726435033753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_sqv2_en_5.5.0_3.0_1726435033753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_sqv2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_sqv2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_sqv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Huyisbeee/distilbert-finetuned-sqv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_stsb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_stsb_pipeline_en.md new file mode 100644 index 00000000000000..14697df9f05a9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_stsb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_stsb_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_stsb_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_stsb_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_stsb_pipeline_en_5.5.0_3.0_1726406213722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_stsb_pipeline_en_5.5.0_3.0_1726406213722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_stsb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_stsb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_stsb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_pretrain_stsb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_scam_classifier_v1_2_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_scam_classifier_v1_2_en.md new file mode 100644 index 00000000000000..a6b221bedc6b5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_scam_classifier_v1_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_scam_classifier_v1_2 DistilBertForSequenceClassification from BothBosu +author: John Snow Labs +name: distilbert_scam_classifier_v1_2 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_scam_classifier_v1_2` is a English model originally trained by BothBosu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_scam_classifier_v1_2_en_5.5.0_3.0_1726406550689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_scam_classifier_v1_2_en_5.5.0_3.0_1726406550689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_scam_classifier_v1_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_scam_classifier_v1_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_scam_classifier_v1_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BothBosu/distilbert-scam-classifier-v1.2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilroberta_base_mrpc_glue_yoel_c_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilroberta_base_mrpc_glue_yoel_c_pipeline_en.md new file mode 100644 index 00000000000000..5de6e1ccb856e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilroberta_base_mrpc_glue_yoel_c_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_mrpc_glue_yoel_c_pipeline pipeline RoBertaForSequenceClassification from YoelCanaza +author: John Snow Labs +name: distilroberta_base_mrpc_glue_yoel_c_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_mrpc_glue_yoel_c_pipeline` is a English model originally trained by YoelCanaza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_mrpc_glue_yoel_c_pipeline_en_5.5.0_3.0_1726439158457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_mrpc_glue_yoel_c_pipeline_en_5.5.0_3.0_1726439158457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_mrpc_glue_yoel_c_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_mrpc_glue_yoel_c_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_mrpc_glue_yoel_c_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/YoelCanaza/distilroberta-base-mrpc-glue-yoel-c + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-dnabert_k07_w10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-dnabert_k07_w10_pipeline_en.md new file mode 100644 index 00000000000000..595ceffec54fa4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-dnabert_k07_w10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dnabert_k07_w10_pipeline pipeline RoBertaEmbeddings from wjbmattingly +author: John Snow Labs +name: dnabert_k07_w10_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dnabert_k07_w10_pipeline` is a English model originally trained by wjbmattingly. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dnabert_k07_w10_pipeline_en_5.5.0_3.0_1726383263920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dnabert_k07_w10_pipeline_en_5.5.0_3.0_1726383263920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dnabert_k07_w10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dnabert_k07_w10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dnabert_k07_w10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.9 MB| + +## References + +https://huggingface.co/wjbmattingly/dnaBERT-k07-w10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-dopamin_java_deprecation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-dopamin_java_deprecation_pipeline_en.md new file mode 100644 index 00000000000000..370e848fd01a92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-dopamin_java_deprecation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dopamin_java_deprecation_pipeline pipeline RoBertaForSequenceClassification from Fsoft-AIC +author: John Snow Labs +name: dopamin_java_deprecation_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dopamin_java_deprecation_pipeline` is a English model originally trained by Fsoft-AIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dopamin_java_deprecation_pipeline_en_5.5.0_3.0_1726439180368.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dopamin_java_deprecation_pipeline_en_5.5.0_3.0_1726439180368.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dopamin_java_deprecation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dopamin_java_deprecation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dopamin_java_deprecation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/Fsoft-AIC/dopamin-java-deprecation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-efficient_mlm_m0_15_801010_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-efficient_mlm_m0_15_801010_pipeline_en.md new file mode 100644 index 00000000000000..5eaca7e6e45acd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-efficient_mlm_m0_15_801010_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English efficient_mlm_m0_15_801010_pipeline pipeline RoBertaEmbeddings from princeton-nlp +author: John Snow Labs +name: efficient_mlm_m0_15_801010_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`efficient_mlm_m0_15_801010_pipeline` is a English model originally trained by princeton-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_15_801010_pipeline_en_5.5.0_3.0_1726413880895.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_15_801010_pipeline_en_5.5.0_3.0_1726413880895.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("efficient_mlm_m0_15_801010_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("efficient_mlm_m0_15_801010_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|efficient_mlm_m0_15_801010_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|844.2 MB| + +## References + +https://huggingface.co/princeton-nlp/efficient_mlm_m0.15-801010 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-enron_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-enron_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..da1b67414cbd5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-enron_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English enron_distilbert_pipeline pipeline DistilBertForSequenceClassification from bvk +author: John Snow Labs +name: enron_distilbert_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enron_distilbert_pipeline` is a English model originally trained by bvk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enron_distilbert_pipeline_en_5.5.0_3.0_1726385313164.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enron_distilbert_pipeline_en_5.5.0_3.0_1726385313164.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("enron_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("enron_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enron_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bvk/ENRON-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-fine_tuned_distilbert_isha31101999_en.md b/docs/_posts/ahmedlone127/2024-09-15-fine_tuned_distilbert_isha31101999_en.md new file mode 100644 index 00000000000000..b9c9e25c5dc399 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-fine_tuned_distilbert_isha31101999_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuned_distilbert_isha31101999 DistilBertForSequenceClassification from Isha31101999 +author: John Snow Labs +name: fine_tuned_distilbert_isha31101999 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_distilbert_isha31101999` is a English model originally trained by Isha31101999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_isha31101999_en_5.5.0_3.0_1726394372338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_isha31101999_en_5.5.0_3.0_1726394372338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_distilbert_isha31101999","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_distilbert_isha31101999", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_distilbert_isha31101999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Isha31101999/fine-tuned-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-fine_tuned_distilbert_isha31101999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-fine_tuned_distilbert_isha31101999_pipeline_en.md new file mode 100644 index 00000000000000..287f993538211e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-fine_tuned_distilbert_isha31101999_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fine_tuned_distilbert_isha31101999_pipeline pipeline DistilBertForSequenceClassification from Isha31101999 +author: John Snow Labs +name: fine_tuned_distilbert_isha31101999_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_distilbert_isha31101999_pipeline` is a English model originally trained by Isha31101999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_isha31101999_pipeline_en_5.5.0_3.0_1726394384718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_isha31101999_pipeline_en_5.5.0_3.0_1726394384718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_distilbert_isha31101999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_distilbert_isha31101999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_distilbert_isha31101999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Isha31101999/fine-tuned-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_distilbert_sentiment_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_distilbert_sentiment_v1_pipeline_en.md new file mode 100644 index 00000000000000..175846f408db5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_distilbert_sentiment_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_distilbert_sentiment_v1_pipeline pipeline DistilBertForSequenceClassification from Aleksia +author: John Snow Labs +name: finetuning_distilbert_sentiment_v1_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_distilbert_sentiment_v1_pipeline` is a English model originally trained by Aleksia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_distilbert_sentiment_v1_pipeline_en_5.5.0_3.0_1726406392546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_distilbert_sentiment_v1_pipeline_en_5.5.0_3.0_1726406392546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_distilbert_sentiment_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_distilbert_sentiment_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_distilbert_sentiment_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Aleksia/finetuning-distilBert_sentiment_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_juliocrm_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_juliocrm_en.md new file mode 100644 index 00000000000000..aa243dc2a47427 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_juliocrm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_juliocrm DistilBertForSequenceClassification from Juliocrm +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_juliocrm +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_juliocrm` is a English model originally trained by Juliocrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_juliocrm_en_5.5.0_3.0_1726406469868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_juliocrm_en_5.5.0_3.0_1726406469868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_juliocrm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_juliocrm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_juliocrm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Juliocrm/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_klumdedum_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_klumdedum_en.md new file mode 100644 index 00000000000000..bd461aa3f4967c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_klumdedum_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_klumdedum DistilBertForSequenceClassification from klumdedum +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_klumdedum +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_klumdedum` is a English model originally trained by klumdedum. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_klumdedum_en_5.5.0_3.0_1726394199274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_klumdedum_en_5.5.0_3.0_1726394199274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_klumdedum","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_klumdedum", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_klumdedum| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/klumdedum/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_klumdedum_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_klumdedum_pipeline_en.md new file mode 100644 index 00000000000000..5f1d74004281b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_klumdedum_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_klumdedum_pipeline pipeline DistilBertForSequenceClassification from klumdedum +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_klumdedum_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_klumdedum_pipeline` is a English model originally trained by klumdedum. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_klumdedum_pipeline_en_5.5.0_3.0_1726394211670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_klumdedum_pipeline_en_5.5.0_3.0_1726394211670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_klumdedum_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_klumdedum_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_klumdedum_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/klumdedum/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_vipuljain_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_vipuljain_pipeline_en.md new file mode 100644 index 00000000000000..cd9799f08e5131 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_vipuljain_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_vipuljain_pipeline pipeline DistilBertForSequenceClassification from vipuljain +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_vipuljain_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_vipuljain_pipeline` is a English model originally trained by vipuljain. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_vipuljain_pipeline_en_5.5.0_3.0_1726366211036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_vipuljain_pipeline_en_5.5.0_3.0_1726366211036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_vipuljain_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_vipuljain_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_vipuljain_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vipuljain/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-formalberta2_en.md b/docs/_posts/ahmedlone127/2024-09-15-formalberta2_en.md new file mode 100644 index 00000000000000..26e46513747695 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-formalberta2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English formalberta2 RoBertaEmbeddings from BigSalmon +author: John Snow Labs +name: formalberta2 +date: 2024-09-15 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`formalberta2` is a English model originally trained by BigSalmon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/formalberta2_en_5.5.0_3.0_1726383400733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/formalberta2_en_5.5.0_3.0_1726383400733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("formalberta2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("formalberta2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|formalberta2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/BigSalmon/FormalBerta2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-furina_seed42_eng_amh_hau_cross_latin_2e_05_en.md b/docs/_posts/ahmedlone127/2024-09-15-furina_seed42_eng_amh_hau_cross_latin_2e_05_en.md new file mode 100644 index 00000000000000..da30fd39c03954 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-furina_seed42_eng_amh_hau_cross_latin_2e_05_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English furina_seed42_eng_amh_hau_cross_latin_2e_05 XlmRoBertaForSequenceClassification from Shijia +author: John Snow Labs +name: furina_seed42_eng_amh_hau_cross_latin_2e_05 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina_seed42_eng_amh_hau_cross_latin_2e_05` is a English model originally trained by Shijia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_amh_hau_cross_latin_2e_05_en_5.5.0_3.0_1726373328243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_amh_hau_cross_latin_2e_05_en_5.5.0_3.0_1726373328243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("furina_seed42_eng_amh_hau_cross_latin_2e_05","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("furina_seed42_eng_amh_hau_cross_latin_2e_05", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina_seed42_eng_amh_hau_cross_latin_2e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/Shijia/furina_seed42_eng_amh_hau_cross_latin_2e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-furina_seed42_eng_kinyarwanda_amh_cross_0_0001_en.md b/docs/_posts/ahmedlone127/2024-09-15-furina_seed42_eng_kinyarwanda_amh_cross_0_0001_en.md new file mode 100644 index 00000000000000..accef423f011fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-furina_seed42_eng_kinyarwanda_amh_cross_0_0001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English furina_seed42_eng_kinyarwanda_amh_cross_0_0001 XlmRoBertaForSequenceClassification from Shijia +author: John Snow Labs +name: furina_seed42_eng_kinyarwanda_amh_cross_0_0001 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina_seed42_eng_kinyarwanda_amh_cross_0_0001` is a English model originally trained by Shijia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_kinyarwanda_amh_cross_0_0001_en_5.5.0_3.0_1726434142869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_kinyarwanda_amh_cross_0_0001_en_5.5.0_3.0_1726434142869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("furina_seed42_eng_kinyarwanda_amh_cross_0_0001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("furina_seed42_eng_kinyarwanda_amh_cross_0_0001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina_seed42_eng_kinyarwanda_amh_cross_0_0001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/Shijia/furina_seed42_eng_kin_amh_cross_0.0001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_en.md b/docs/_posts/ahmedlone127/2024-09-15-gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_en.md new file mode 100644 index 00000000000000..a1f0d89268fee4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac BertForSequenceClassification from tanoManzo +author: John Snow Labs +name: gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac` is a English model originally trained by tanoManzo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_en_5.5.0_3.0_1726379091815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_en_5.5.0_3.0_1726379091815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tanoManzo/gena-lm-bert-large-t2t_ft_Hepg2_1kbpHG19_DHSs_H3K27AC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en.md new file mode 100644 index 00000000000000..dee3a91e6616aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline pipeline BertForSequenceClassification from tanoManzo +author: John Snow Labs +name: gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline` is a English model originally trained by tanoManzo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en_5.5.0_3.0_1726379146744.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en_5.5.0_3.0_1726379146744.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gena_lm_bert_large_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tanoManzo/gena-lm-bert-large-t2t_ft_Hepg2_1kbpHG19_DHSs_H3K27AC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-geocoder_coordinates_model_en.md b/docs/_posts/ahmedlone127/2024-09-15-geocoder_coordinates_model_en.md new file mode 100644 index 00000000000000..6663c89b4b4c4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-geocoder_coordinates_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English geocoder_coordinates_model XlmRoBertaForSequenceClassification from azamat +author: John Snow Labs +name: geocoder_coordinates_model +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`geocoder_coordinates_model` is a English model originally trained by azamat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/geocoder_coordinates_model_en_5.5.0_3.0_1726373211506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/geocoder_coordinates_model_en_5.5.0_3.0_1726373211506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("geocoder_coordinates_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("geocoder_coordinates_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|geocoder_coordinates_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|886.3 MB| + +## References + +https://huggingface.co/azamat/geocoder_coordinates_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-grammar_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-grammar_classifier_pipeline_en.md new file mode 100644 index 00000000000000..b8654388fcac04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-grammar_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English grammar_classifier_pipeline pipeline DistilBertForSequenceClassification from gratkadlafana +author: John Snow Labs +name: grammar_classifier_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`grammar_classifier_pipeline` is a English model originally trained by gratkadlafana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/grammar_classifier_pipeline_en_5.5.0_3.0_1726385600967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/grammar_classifier_pipeline_en_5.5.0_3.0_1726385600967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("grammar_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("grammar_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|grammar_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gratkadlafana/grammar_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-graphcodebert_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-graphcodebert_v1_pipeline_en.md new file mode 100644 index 00000000000000..12ffae6affacf6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-graphcodebert_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English graphcodebert_v1_pipeline pipeline RoBertaEmbeddings from Ensheng +author: John Snow Labs +name: graphcodebert_v1_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`graphcodebert_v1_pipeline` is a English model originally trained by Ensheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/graphcodebert_v1_pipeline_en_5.5.0_3.0_1726383815070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/graphcodebert_v1_pipeline_en_5.5.0_3.0_1726383815070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("graphcodebert_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("graphcodebert_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|graphcodebert_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/Ensheng/graphcodebert-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-hamsa_tiny_v0_8_en.md b/docs/_posts/ahmedlone127/2024-09-15-hamsa_tiny_v0_8_en.md new file mode 100644 index 00000000000000..49a178e714fd51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-hamsa_tiny_v0_8_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English hamsa_tiny_v0_8 WhisperForCTC from Ahmed107 +author: John Snow Labs +name: hamsa_tiny_v0_8 +date: 2024-09-15 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hamsa_tiny_v0_8` is a English model originally trained by Ahmed107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hamsa_tiny_v0_8_en_5.5.0_3.0_1726427589748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hamsa_tiny_v0_8_en_5.5.0_3.0_1726427589748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("hamsa_tiny_v0_8","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("hamsa_tiny_v0_8", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hamsa_tiny_v0_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.5 MB| + +## References + +https://huggingface.co/Ahmed107/hamsa-tiny-v0.8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-hate_hate_balance_random1_seed1_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-15-hate_hate_balance_random1_seed1_bernice_en.md new file mode 100644 index 00000000000000..b183408a2a8b8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-hate_hate_balance_random1_seed1_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random1_seed1_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random1_seed1_bernice +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random1_seed1_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random1_seed1_bernice_en_5.5.0_3.0_1726374228987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random1_seed1_bernice_en_5.5.0_3.0_1726374228987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_random1_seed1_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_random1_seed1_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random1_seed1_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|783.4 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random1_seed1-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-hatespeech_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-hatespeech_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..287eda8edd89aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-hatespeech_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hatespeech_distilbert_pipeline pipeline DistilBertForSequenceClassification from DL-Project +author: John Snow Labs +name: hatespeech_distilbert_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hatespeech_distilbert_pipeline` is a English model originally trained by DL-Project. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hatespeech_distilbert_pipeline_en_5.5.0_3.0_1726366300008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hatespeech_distilbert_pipeline_en_5.5.0_3.0_1726366300008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hatespeech_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hatespeech_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hatespeech_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DL-Project/hatespeech_distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-hodravidianlangtech_en.md b/docs/_posts/ahmedlone127/2024-09-15-hodravidianlangtech_en.md new file mode 100644 index 00000000000000..3d6af26f634c16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-hodravidianlangtech_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hodravidianlangtech XlmRoBertaForSequenceClassification from Udoy +author: John Snow Labs +name: hodravidianlangtech +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hodravidianlangtech` is a English model originally trained by Udoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hodravidianlangtech_en_5.5.0_3.0_1726440831561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hodravidianlangtech_en_5.5.0_3.0_1726440831561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hodravidianlangtech","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hodravidianlangtech", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hodravidianlangtech| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|780.9 MB| + +## References + +https://huggingface.co/Udoy/HODravidianLangTech \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-hybrid_qa_en.md b/docs/_posts/ahmedlone127/2024-09-15-hybrid_qa_en.md new file mode 100644 index 00000000000000..a5b4a7b021ebbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-hybrid_qa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English hybrid_qa DistilBertForQuestionAnswering from justinhl +author: John Snow Labs +name: hybrid_qa +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hybrid_qa` is a English model originally trained by justinhl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hybrid_qa_en_5.5.0_3.0_1726435125771.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hybrid_qa_en_5.5.0_3.0_1726435125771.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("hybrid_qa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("hybrid_qa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hybrid_qa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|248.0 MB| + +## References + +https://huggingface.co/justinhl/hybrid-qa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-ift6758_devoir_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-ift6758_devoir_6_pipeline_en.md new file mode 100644 index 00000000000000..a1c42d7b824f63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-ift6758_devoir_6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ift6758_devoir_6_pipeline pipeline DistilBertForSequenceClassification from iggy1055 +author: John Snow Labs +name: ift6758_devoir_6_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ift6758_devoir_6_pipeline` is a English model originally trained by iggy1055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ift6758_devoir_6_pipeline_en_5.5.0_3.0_1726406669178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ift6758_devoir_6_pipeline_en_5.5.0_3.0_1726406669178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ift6758_devoir_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ift6758_devoir_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ift6758_devoir_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/iggy1055/ift6758_devoir_6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-imdb_text_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-imdb_text_classifier_pipeline_en.md new file mode 100644 index 00000000000000..e02b6244ffd295 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-imdb_text_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdb_text_classifier_pipeline pipeline DistilBertForSequenceClassification from LukeGPT88 +author: John Snow Labs +name: imdb_text_classifier_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_text_classifier_pipeline` is a English model originally trained by LukeGPT88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_text_classifier_pipeline_en_5.5.0_3.0_1726385131160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_text_classifier_pipeline_en_5.5.0_3.0_1726385131160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdb_text_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdb_text_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_text_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LukeGPT88/imdb_text_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-intent_classification_small_en.md b/docs/_posts/ahmedlone127/2024-09-15-intent_classification_small_en.md new file mode 100644 index 00000000000000..c12227a1c1679c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-intent_classification_small_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English intent_classification_small DistilBertForSequenceClassification from dipesh +author: John Snow Labs +name: intent_classification_small +date: 2024-09-15 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`intent_classification_small` is a English model originally trained by dipesh. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/intent_classification_small_en_5.5.0_3.0_1726385197886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/intent_classification_small_en_5.5.0_3.0_1726385197886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("intent_classification_small","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("intent_classification_small","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|intent_classification_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.2 MB| + +## References + +References + +https://huggingface.co/dipesh/Intent-Classification-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-kaz_roberta_base_ft_qa_turkish_maltese_tonga_tonga_islands_kaz_pipeline_kk.md b/docs/_posts/ahmedlone127/2024-09-15-kaz_roberta_base_ft_qa_turkish_maltese_tonga_tonga_islands_kaz_pipeline_kk.md new file mode 100644 index 00000000000000..5f1ce36b5ea82e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-kaz_roberta_base_ft_qa_turkish_maltese_tonga_tonga_islands_kaz_pipeline_kk.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Kazakh kaz_roberta_base_ft_qa_turkish_maltese_tonga_tonga_islands_kaz_pipeline pipeline RoBertaForQuestionAnswering from med-alex +author: John Snow Labs +name: kaz_roberta_base_ft_qa_turkish_maltese_tonga_tonga_islands_kaz_pipeline +date: 2024-09-15 +tags: [kk, open_source, pipeline, onnx] +task: Question Answering +language: kk +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kaz_roberta_base_ft_qa_turkish_maltese_tonga_tonga_islands_kaz_pipeline` is a Kazakh model originally trained by med-alex. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kaz_roberta_base_ft_qa_turkish_maltese_tonga_tonga_islands_kaz_pipeline_kk_5.5.0_3.0_1726363704224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kaz_roberta_base_ft_qa_turkish_maltese_tonga_tonga_islands_kaz_pipeline_kk_5.5.0_3.0_1726363704224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kaz_roberta_base_ft_qa_turkish_maltese_tonga_tonga_islands_kaz_pipeline", lang = "kk") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kaz_roberta_base_ft_qa_turkish_maltese_tonga_tonga_islands_kaz_pipeline", lang = "kk") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kaz_roberta_base_ft_qa_turkish_maltese_tonga_tonga_islands_kaz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|kk| +|Size:|311.7 MB| + +## References + +https://huggingface.co/med-alex/kaz-roberta-base-ft-qa-tr-mt-to-kaz + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-large_np_nw_all_en.md b/docs/_posts/ahmedlone127/2024-09-15-large_np_nw_all_en.md new file mode 100644 index 00000000000000..fd025c9803d72a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-large_np_nw_all_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English large_np_nw_all WhisperForCTC from roviso +author: John Snow Labs +name: large_np_nw_all +date: 2024-09-15 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`large_np_nw_all` is a English model originally trained by roviso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/large_np_nw_all_en_5.5.0_3.0_1726409215485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/large_np_nw_all_en_5.5.0_3.0_1726409215485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("large_np_nw_all","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("large_np_nw_all", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|large_np_nw_all| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/roviso/large-np-nw-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-malayalam_anomaly_en.md b/docs/_posts/ahmedlone127/2024-09-15-malayalam_anomaly_en.md new file mode 100644 index 00000000000000..41ffa3834535db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-malayalam_anomaly_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English malayalam_anomaly DistilBertForSequenceClassification from rn7s2 +author: John Snow Labs +name: malayalam_anomaly +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malayalam_anomaly` is a English model originally trained by rn7s2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malayalam_anomaly_en_5.5.0_3.0_1726366496738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malayalam_anomaly_en_5.5.0_3.0_1726366496738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("malayalam_anomaly","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("malayalam_anomaly", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malayalam_anomaly| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rn7s2/ml_anomaly \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-mini_text_classification_finetune_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-mini_text_classification_finetune_model_pipeline_en.md new file mode 100644 index 00000000000000..af71aefdafb4de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-mini_text_classification_finetune_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mini_text_classification_finetune_model_pipeline pipeline XlmRoBertaForSequenceClassification from mankra +author: John Snow Labs +name: mini_text_classification_finetune_model_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mini_text_classification_finetune_model_pipeline` is a English model originally trained by mankra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mini_text_classification_finetune_model_pipeline_en_5.5.0_3.0_1726373104954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mini_text_classification_finetune_model_pipeline_en_5.5.0_3.0_1726373104954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mini_text_classification_finetune_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mini_text_classification_finetune_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mini_text_classification_finetune_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|333.7 MB| + +## References + +https://huggingface.co/mankra/mini_text_classification_finetune_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-model4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-model4_pipeline_en.md new file mode 100644 index 00000000000000..54049b7a7076cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-model4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model4_pipeline pipeline DistilBertForSequenceClassification from jfunk14 +author: John Snow Labs +name: model4_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model4_pipeline` is a English model originally trained by jfunk14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model4_pipeline_en_5.5.0_3.0_1726385541935.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model4_pipeline_en_5.5.0_3.0_1726385541935.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|250.1 MB| + +## References + +https://huggingface.co/jfunk14/model4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-model_parallelism_en.md b/docs/_posts/ahmedlone127/2024-09-15-model_parallelism_en.md new file mode 100644 index 00000000000000..a23da9681246bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-model_parallelism_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_parallelism XlmRoBertaForSequenceClassification from Grigory1 +author: John Snow Labs +name: model_parallelism +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_parallelism` is a English model originally trained by Grigory1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_parallelism_en_5.5.0_3.0_1726433662859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_parallelism_en_5.5.0_3.0_1726433662859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("model_parallelism","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("model_parallelism", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_parallelism| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|911.9 MB| + +## References + +https://huggingface.co/Grigory1/model_parallelism \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-models_mil00_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-models_mil00_pipeline_en.md new file mode 100644 index 00000000000000..28278101fb008d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-models_mil00_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English models_mil00_pipeline pipeline DistilBertForSequenceClassification from Mil00 +author: John Snow Labs +name: models_mil00_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`models_mil00_pipeline` is a English model originally trained by Mil00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/models_mil00_pipeline_en_5.5.0_3.0_1726393786262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/models_mil00_pipeline_en_5.5.0_3.0_1726393786262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("models_mil00_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("models_mil00_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|models_mil00_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|250.2 MB| + +## References + +https://huggingface.co/Mil00/Models + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-output_mask_two_steps_pretraining_plus_contr_roberta_large_epochs_1_en.md b/docs/_posts/ahmedlone127/2024-09-15-output_mask_two_steps_pretraining_plus_contr_roberta_large_epochs_1_en.md new file mode 100644 index 00000000000000..285ed4c015dc70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-output_mask_two_steps_pretraining_plus_contr_roberta_large_epochs_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English output_mask_two_steps_pretraining_plus_contr_roberta_large_epochs_1 RoBertaForQuestionAnswering from AnonymousSub +author: John Snow Labs +name: output_mask_two_steps_pretraining_plus_contr_roberta_large_epochs_1 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`output_mask_two_steps_pretraining_plus_contr_roberta_large_epochs_1` is a English model originally trained by AnonymousSub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/output_mask_two_steps_pretraining_plus_contr_roberta_large_epochs_1_en_5.5.0_3.0_1726369147559.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/output_mask_two_steps_pretraining_plus_contr_roberta_large_epochs_1_en_5.5.0_3.0_1726369147559.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("output_mask_two_steps_pretraining_plus_contr_roberta_large_epochs_1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("output_mask_two_steps_pretraining_plus_contr_roberta_large_epochs_1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|output_mask_two_steps_pretraining_plus_contr_roberta_large_epochs_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/AnonymousSub/output_mask_TWO_STEPS_pretraining_plus_contr_roberta-large_EPOCHS_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-phraseberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-phraseberta_pipeline_en.md new file mode 100644 index 00000000000000..52451cb1dd69ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-phraseberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English phraseberta_pipeline pipeline RoBertaEmbeddings from BigSalmon +author: John Snow Labs +name: phraseberta_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phraseberta_pipeline` is a English model originally trained by BigSalmon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phraseberta_pipeline_en_5.5.0_3.0_1726413992869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phraseberta_pipeline_en_5.5.0_3.0_1726413992869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phraseberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phraseberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phraseberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.8 MB| + +## References + +https://huggingface.co/BigSalmon/PhraseBerta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-platzi_distilroberta_base_mrpc_glue_andres_rojas_en.md b/docs/_posts/ahmedlone127/2024-09-15-platzi_distilroberta_base_mrpc_glue_andres_rojas_en.md new file mode 100644 index 00000000000000..a33d827e1e1d0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-platzi_distilroberta_base_mrpc_glue_andres_rojas_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_andres_rojas RoBertaForSequenceClassification from ulichovick +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_andres_rojas +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_andres_rojas` is a English model originally trained by ulichovick. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_andres_rojas_en_5.5.0_3.0_1726440059777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_andres_rojas_en_5.5.0_3.0_1726440059777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_andres_rojas","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_andres_rojas", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_andres_rojas| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/ulichovick/platzi-distilroberta-base-mrpc-glue-Andres-Rojas \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-portuguese_up_xlmr_falsefalse_0_0_best_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-portuguese_up_xlmr_falsefalse_0_0_best_pipeline_en.md new file mode 100644 index 00000000000000..57043d22a13886 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-portuguese_up_xlmr_falsefalse_0_0_best_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English portuguese_up_xlmr_falsefalse_0_0_best_pipeline pipeline XlmRoBertaForSequenceClassification from harish +author: John Snow Labs +name: portuguese_up_xlmr_falsefalse_0_0_best_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`portuguese_up_xlmr_falsefalse_0_0_best_pipeline` is a English model originally trained by harish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/portuguese_up_xlmr_falsefalse_0_0_best_pipeline_en_5.5.0_3.0_1726434690005.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/portuguese_up_xlmr_falsefalse_0_0_best_pipeline_en_5.5.0_3.0_1726434690005.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("portuguese_up_xlmr_falsefalse_0_0_best_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("portuguese_up_xlmr_falsefalse_0_0_best_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|portuguese_up_xlmr_falsefalse_0_0_best_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|782.4 MB| + +## References + +https://huggingface.co/harish/PT-UP-xlmR-FalseFalse-0_0_BEST + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-ratingbook_en.md b/docs/_posts/ahmedlone127/2024-09-15-ratingbook_en.md new file mode 100644 index 00000000000000..8d8f8d94058a15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-ratingbook_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ratingbook DistilBertForSequenceClassification from DragonImortal +author: John Snow Labs +name: ratingbook +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ratingbook` is a English model originally trained by DragonImortal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ratingbook_en_5.5.0_3.0_1726394095682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ratingbook_en_5.5.0_3.0_1726394095682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ratingbook","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ratingbook", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ratingbook| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DragonImortal/Ratingbook \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_base_finetuned_squad_nmb_paperspace_hf_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_finetuned_squad_nmb_paperspace_hf_en.md new file mode 100644 index 00000000000000..acb5fd241e1325 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_finetuned_squad_nmb_paperspace_hf_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_finetuned_squad_nmb_paperspace_hf RoBertaForQuestionAnswering from nmb-paperspace-hf +author: John Snow Labs +name: roberta_base_finetuned_squad_nmb_paperspace_hf +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_squad_nmb_paperspace_hf` is a English model originally trained by nmb-paperspace-hf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_nmb_paperspace_hf_en_5.5.0_3.0_1726369295994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_nmb_paperspace_hf_en_5.5.0_3.0_1726369295994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetuned_squad_nmb_paperspace_hf","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetuned_squad_nmb_paperspace_hf", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_squad_nmb_paperspace_hf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|298.4 MB| + +## References + +https://huggingface.co/nmb-paperspace-hf/roberta-base-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_base_lora_591k_squad_model1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_lora_591k_squad_model1_pipeline_en.md new file mode 100644 index 00000000000000..3a44d0344bf291 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_lora_591k_squad_model1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_base_lora_591k_squad_model1_pipeline pipeline RoBertaForQuestionAnswering from varun-v-rao +author: John Snow Labs +name: roberta_base_lora_591k_squad_model1_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_lora_591k_squad_model1_pipeline` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_lora_591k_squad_model1_pipeline_en_5.5.0_3.0_1726379810102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_lora_591k_squad_model1_pipeline_en_5.5.0_3.0_1726379810102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_lora_591k_squad_model1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_lora_591k_squad_model1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_lora_591k_squad_model1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|317.4 MB| + +## References + +https://huggingface.co/varun-v-rao/roberta-base-lora-591K-squad-model1 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_base_squad2_finetuned_squad_ahujaniharika95_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_squad2_finetuned_squad_ahujaniharika95_en.md new file mode 100644 index 00000000000000..03ce4e9f75b45e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_squad2_finetuned_squad_ahujaniharika95_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_squad2_finetuned_squad_ahujaniharika95 RoBertaForQuestionAnswering from ahujaniharika95 +author: John Snow Labs +name: roberta_base_squad2_finetuned_squad_ahujaniharika95 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_squad2_finetuned_squad_ahujaniharika95` is a English model originally trained by ahujaniharika95. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_squad2_finetuned_squad_ahujaniharika95_en_5.5.0_3.0_1726379593456.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_squad2_finetuned_squad_ahujaniharika95_en_5.5.0_3.0_1726379593456.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_squad2_finetuned_squad_ahujaniharika95","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_squad2_finetuned_squad_ahujaniharika95", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_squad2_finetuned_squad_ahujaniharika95| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.3 MB| + +## References + +https://huggingface.co/ahujaniharika95/roberta-base-squad2-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_base_wikimovies_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_wikimovies_pipeline_en.md new file mode 100644 index 00000000000000..4469a021b53e25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_wikimovies_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_wikimovies_pipeline pipeline RoBertaEmbeddings from thatdramebaazguy +author: John Snow Labs +name: roberta_base_wikimovies_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_wikimovies_pipeline` is a English model originally trained by thatdramebaazguy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_wikimovies_pipeline_en_5.5.0_3.0_1726414105024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_wikimovies_pipeline_en_5.5.0_3.0_1726414105024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_wikimovies_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_wikimovies_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_wikimovies_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/thatdramebaazguy/roberta-base-wikimovies + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_subjqa_movies_2_vmg1957_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_subjqa_movies_2_vmg1957_en.md new file mode 100644 index 00000000000000..bb1c8ac6465fdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_subjqa_movies_2_vmg1957_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_vmg1957 RoBertaForQuestionAnswering from vmg1957 +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_vmg1957 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_vmg1957` is a English model originally trained by vmg1957. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_vmg1957_en_5.5.0_3.0_1726404321540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_vmg1957_en_5.5.0_3.0_1726404321540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_vmg1957","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_vmg1957", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_vmg1957| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/vmg1957/roberta-finetuned-subjqa-movies_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_subjqa_movies_2_vmg1957_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_subjqa_movies_2_vmg1957_pipeline_en.md new file mode 100644 index 00000000000000..a7564a5501577a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_subjqa_movies_2_vmg1957_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_vmg1957_pipeline pipeline RoBertaForQuestionAnswering from vmg1957 +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_vmg1957_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_vmg1957_pipeline` is a English model originally trained by vmg1957. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_vmg1957_pipeline_en_5.5.0_3.0_1726404343940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_vmg1957_pipeline_en_5.5.0_3.0_1726404343940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_subjqa_movies_2_vmg1957_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_subjqa_movies_2_vmg1957_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_vmg1957_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/vmg1957/roberta-finetuned-subjqa-movies_2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_subjqa_restaurants_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_subjqa_restaurants_2_pipeline_en.md new file mode 100644 index 00000000000000..996f781fd1c5fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_subjqa_restaurants_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_restaurants_2_pipeline pipeline RoBertaForQuestionAnswering from velikiy +author: John Snow Labs +name: roberta_finetuned_subjqa_restaurants_2_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_restaurants_2_pipeline` is a English model originally trained by velikiy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_restaurants_2_pipeline_en_5.5.0_3.0_1726364226972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_restaurants_2_pipeline_en_5.5.0_3.0_1726364226972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_subjqa_restaurants_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_subjqa_restaurants_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_restaurants_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.8 MB| + +## References + +https://huggingface.co/velikiy/roberta-finetuned-subjqa-restaurants_2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_large_bne_sqac_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_bne_sqac_finetuned_squad_en.md new file mode 100644 index 00000000000000..4147f301ea3d13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_bne_sqac_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_large_bne_sqac_finetuned_squad RoBertaForQuestionAnswering from osuna +author: John Snow Labs +name: roberta_large_bne_sqac_finetuned_squad +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_bne_sqac_finetuned_squad` is a English model originally trained by osuna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_bne_sqac_finetuned_squad_en_5.5.0_3.0_1726404395568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_bne_sqac_finetuned_squad_en_5.5.0_3.0_1726404395568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_bne_sqac_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_bne_sqac_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_bne_sqac_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/osuna/roberta-large-bne-sqac-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_large_few_shot_k_16_finetuned_squad_seed_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_few_shot_k_16_finetuned_squad_seed_4_pipeline_en.md new file mode 100644 index 00000000000000..d3cf1327a018e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_few_shot_k_16_finetuned_squad_seed_4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_large_few_shot_k_16_finetuned_squad_seed_4_pipeline pipeline RoBertaForQuestionAnswering from anas-awadalla +author: John Snow Labs +name: roberta_large_few_shot_k_16_finetuned_squad_seed_4_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_few_shot_k_16_finetuned_squad_seed_4_pipeline` is a English model originally trained by anas-awadalla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_few_shot_k_16_finetuned_squad_seed_4_pipeline_en_5.5.0_3.0_1726369175675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_few_shot_k_16_finetuned_squad_seed_4_pipeline_en_5.5.0_3.0_1726369175675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_few_shot_k_16_finetuned_squad_seed_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_few_shot_k_16_finetuned_squad_seed_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_few_shot_k_16_finetuned_squad_seed_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/anas-awadalla/roberta-large-few-shot-k-16-finetuned-squad-seed-4 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_large_finetuned_mrpc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_finetuned_mrpc_pipeline_en.md new file mode 100644 index 00000000000000..d5ebb2304618c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_finetuned_mrpc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_mrpc_pipeline pipeline RoBertaForSequenceClassification from VitaliiVrublevskyi +author: John Snow Labs +name: roberta_large_finetuned_mrpc_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_mrpc_pipeline` is a English model originally trained by VitaliiVrublevskyi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_mrpc_pipeline_en_5.5.0_3.0_1726401748447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_mrpc_pipeline_en_5.5.0_3.0_1726401748447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_mrpc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_mrpc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_mrpc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/VitaliiVrublevskyi/roberta-large-finetuned-mrpc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_large_finetuned_squad_lahen_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_finetuned_squad_lahen_en.md new file mode 100644 index 00000000000000..919055cbcf96e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_finetuned_squad_lahen_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_large_finetuned_squad_lahen RoBertaForQuestionAnswering from Lahen +author: John Snow Labs +name: roberta_large_finetuned_squad_lahen +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_squad_lahen` is a English model originally trained by Lahen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_squad_lahen_en_5.5.0_3.0_1726404643442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_squad_lahen_en_5.5.0_3.0_1726404643442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_finetuned_squad_lahen","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_finetuned_squad_lahen", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_squad_lahen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Lahen/roberta-large-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_large_itpt_fp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_itpt_fp_pipeline_en.md new file mode 100644 index 00000000000000..f395c766526940 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_itpt_fp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_itpt_fp_pipeline pipeline RoBertaEmbeddings from geninhu +author: John Snow Labs +name: roberta_large_itpt_fp_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_itpt_fp_pipeline` is a English model originally trained by geninhu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_itpt_fp_pipeline_en_5.5.0_3.0_1726413850609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_itpt_fp_pipeline_en_5.5.0_3.0_1726413850609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_itpt_fp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_itpt_fp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_itpt_fp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/geninhu/roberta_large_ITPT_FP + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_spanish_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_spanish_v2_pipeline_en.md new file mode 100644 index 00000000000000..5f25952525de35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_spanish_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_spanish_v2_pipeline pipeline RoBertaForQuestionAnswering from enriquesaou +author: John Snow Labs +name: roberta_spanish_v2_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_spanish_v2_pipeline` is a English model originally trained by enriquesaou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_spanish_v2_pipeline_en_5.5.0_3.0_1726364064348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_spanish_v2_pipeline_en_5.5.0_3.0_1726364064348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_spanish_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_spanish_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_spanish_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.5 MB| + +## References + +https://huggingface.co/enriquesaou/roberta_es_v2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_tuned_trial_13_13_2022_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_tuned_trial_13_13_2022_en.md new file mode 100644 index 00000000000000..bd3418b5698615 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_tuned_trial_13_13_2022_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tuned_trial_13_13_2022 RoBertaEmbeddings from EslamAhmed +author: John Snow Labs +name: roberta_tuned_trial_13_13_2022 +date: 2024-09-15 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tuned_trial_13_13_2022` is a English model originally trained by EslamAhmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tuned_trial_13_13_2022_en_5.5.0_3.0_1726383317555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tuned_trial_13_13_2022_en_5.5.0_3.0_1726383317555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_tuned_trial_13_13_2022","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_tuned_trial_13_13_2022","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tuned_trial_13_13_2022| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/EslamAhmed/roberta_tuned_trial_13-13-2022 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-rubert_tiny_review_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-rubert_tiny_review_pipeline_en.md new file mode 100644 index 00000000000000..bc1a134e25164f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-rubert_tiny_review_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English rubert_tiny_review_pipeline pipeline BertForSequenceClassification from igorktech +author: John Snow Labs +name: rubert_tiny_review_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny_review_pipeline` is a English model originally trained by igorktech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny_review_pipeline_en_5.5.0_3.0_1726378207628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny_review_pipeline_en_5.5.0_3.0_1726378207628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_tiny_review_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_tiny_review_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny_review_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/igorktech/rubert-tiny-review + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-ruberttiny_attr_en.md b/docs/_posts/ahmedlone127/2024-09-15-ruberttiny_attr_en.md new file mode 100644 index 00000000000000..91d5ed081fee90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-ruberttiny_attr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ruberttiny_attr BertForSequenceClassification from Shakhovak +author: John Snow Labs +name: ruberttiny_attr +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruberttiny_attr` is a English model originally trained by Shakhovak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruberttiny_attr_en_5.5.0_3.0_1726376156246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruberttiny_attr_en_5.5.0_3.0_1726376156246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ruberttiny_attr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ruberttiny_attr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruberttiny_attr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|109.5 MB| + +## References + +https://huggingface.co/Shakhovak/ruBertTiny_attr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-scenario_non_kd_from_scratch_data_indolem_sentiment_model_xlm_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-scenario_non_kd_from_scratch_data_indolem_sentiment_model_xlm_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..15060e9b20eb60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-scenario_non_kd_from_scratch_data_indolem_sentiment_model_xlm_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scenario_non_kd_from_scratch_data_indolem_sentiment_model_xlm_roberta_base_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_from_scratch_data_indolem_sentiment_model_xlm_roberta_base_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_from_scratch_data_indolem_sentiment_model_xlm_roberta_base_pipeline` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_from_scratch_data_indolem_sentiment_model_xlm_roberta_base_pipeline_en_5.5.0_3.0_1726433006371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_from_scratch_data_indolem_sentiment_model_xlm_roberta_base_pipeline_en_5.5.0_3.0_1726433006371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_non_kd_from_scratch_data_indolem_sentiment_model_xlm_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_non_kd_from_scratch_data_indolem_sentiment_model_xlm_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_from_scratch_data_indolem_sentiment_model_xlm_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|883.5 MB| + +## References + +https://huggingface.co/haryoaw/scenario-non-kd-from-scratch-data-indolem_sentiment-model-xlm-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-scenario_tcr_4_data_cardiffnlp_tweet_sentiment_multilingual_all_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-15-scenario_tcr_4_data_cardiffnlp_tweet_sentiment_multilingual_all_pipeline_xx.md new file mode 100644 index 00000000000000..87805d1205ea0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-scenario_tcr_4_data_cardiffnlp_tweet_sentiment_multilingual_all_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual scenario_tcr_4_data_cardiffnlp_tweet_sentiment_multilingual_all_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_tcr_4_data_cardiffnlp_tweet_sentiment_multilingual_all_pipeline +date: 2024-09-15 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_tcr_4_data_cardiffnlp_tweet_sentiment_multilingual_all_pipeline` is a Multilingual model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_tcr_4_data_cardiffnlp_tweet_sentiment_multilingual_all_pipeline_xx_5.5.0_3.0_1726440683200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_tcr_4_data_cardiffnlp_tweet_sentiment_multilingual_all_pipeline_xx_5.5.0_3.0_1726440683200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_tcr_4_data_cardiffnlp_tweet_sentiment_multilingual_all_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_tcr_4_data_cardiffnlp_tweet_sentiment_multilingual_all_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_tcr_4_data_cardiffnlp_tweet_sentiment_multilingual_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|852.4 MB| + +## References + +https://huggingface.co/haryoaw/scenario-TCR-4_data-cardiffnlp_tweet_sentiment_multilingual_all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_archaeobert_en.md b/docs/_posts/ahmedlone127/2024-09-15-sent_archaeobert_en.md new file mode 100644 index 00000000000000..4c310e1d33c980 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_archaeobert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_archaeobert BertSentenceEmbeddings from alexbrandsen +author: John Snow Labs +name: sent_archaeobert +date: 2024-09-15 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_archaeobert` is a English model originally trained by alexbrandsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_archaeobert_en_5.5.0_3.0_1726394949543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_archaeobert_en_5.5.0_3.0_1726394949543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_archaeobert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_archaeobert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_archaeobert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/alexbrandsen/ArchaeoBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_berel_2_0_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-15-sent_berel_2_0_pipeline_he.md new file mode 100644 index 00000000000000..0ece3af8a8c2b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_berel_2_0_pipeline_he.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Hebrew sent_berel_2_0_pipeline pipeline BertSentenceEmbeddings from dicta-il +author: John Snow Labs +name: sent_berel_2_0_pipeline +date: 2024-09-15 +tags: [he, open_source, pipeline, onnx] +task: Embeddings +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_berel_2_0_pipeline` is a Hebrew model originally trained by dicta-il. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_berel_2_0_pipeline_he_5.5.0_3.0_1726436503660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_berel_2_0_pipeline_he_5.5.0_3.0_1726436503660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_berel_2_0_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_berel_2_0_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_berel_2_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|690.5 MB| + +## References + +https://huggingface.co/dicta-il/BEREL_2.0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_japanese_ssuw_pipeline_ja.md b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_japanese_ssuw_pipeline_ja.md new file mode 100644 index 00000000000000..211c5c7fb83a07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_japanese_ssuw_pipeline_ja.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Japanese sent_bert_base_japanese_ssuw_pipeline pipeline BertSentenceEmbeddings from ku-accms +author: John Snow Labs +name: sent_bert_base_japanese_ssuw_pipeline +date: 2024-09-15 +tags: [ja, open_source, pipeline, onnx] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_japanese_ssuw_pipeline` is a Japanese model originally trained by ku-accms. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_japanese_ssuw_pipeline_ja_5.5.0_3.0_1726394775281.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_japanese_ssuw_pipeline_ja_5.5.0_3.0_1726394775281.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_japanese_ssuw_pipeline", lang = "ja") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_japanese_ssuw_pipeline", lang = "ja") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_japanese_ssuw_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ja| +|Size:|412.5 MB| + +## References + +https://huggingface.co/ku-accms/bert-base-japanese-ssuw + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_multilingual_cased_finetuned_dholuo_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_multilingual_cased_finetuned_dholuo_pipeline_xx.md new file mode 100644 index 00000000000000..6a7fd68f6fb2ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_multilingual_cased_finetuned_dholuo_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_dholuo_pipeline pipeline BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_dholuo_pipeline +date: 2024-09-15 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_dholuo_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_dholuo_pipeline_xx_5.5.0_3.0_1726394740281.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_dholuo_pipeline_xx_5.5.0_3.0_1726394740281.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_dholuo_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_dholuo_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_dholuo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.6 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-luo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_multilingual_cased_finetuned_dholuo_xx.md b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_multilingual_cased_finetuned_dholuo_xx.md new file mode 100644 index 00000000000000..8bd8c5a0ee0fcb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_multilingual_cased_finetuned_dholuo_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_dholuo BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_dholuo +date: 2024-09-15 +tags: [xx, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_dholuo` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_dholuo_xx_5.5.0_3.0_1726394707751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_dholuo_xx_5.5.0_3.0_1726394707751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_cased_finetuned_dholuo","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_cased_finetuned_dholuo","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_dholuo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|665.0 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-luo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_vk_posts_en.md b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_vk_posts_en.md new file mode 100644 index 00000000000000..b6c182b9a134f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_vk_posts_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_vk_posts BertSentenceEmbeddings from serggor +author: John Snow Labs +name: sent_bert_base_vk_posts +date: 2024-09-15 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_vk_posts` is a English model originally trained by serggor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_vk_posts_en_5.5.0_3.0_1726443123071.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_vk_posts_en_5.5.0_3.0_1726443123071.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_vk_posts","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_vk_posts","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_vk_posts| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/serggor/bert-base-vk-posts \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_indicbertv2_mlm_back_tlm_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-15-sent_indicbertv2_mlm_back_tlm_pipeline_xx.md new file mode 100644 index 00000000000000..292010ad72a094 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_indicbertv2_mlm_back_tlm_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_indicbertv2_mlm_back_tlm_pipeline pipeline BertSentenceEmbeddings from ai4bharat +author: John Snow Labs +name: sent_indicbertv2_mlm_back_tlm_pipeline +date: 2024-09-15 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_indicbertv2_mlm_back_tlm_pipeline` is a Multilingual model originally trained by ai4bharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_indicbertv2_mlm_back_tlm_pipeline_xx_5.5.0_3.0_1726377743276.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_indicbertv2_mlm_back_tlm_pipeline_xx_5.5.0_3.0_1726377743276.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_indicbertv2_mlm_back_tlm_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_indicbertv2_mlm_back_tlm_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_indicbertv2_mlm_back_tlm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/ai4bharat/IndicBERTv2-MLM-Back-TLM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_mitre_bert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-15-sent_mitre_bert_base_uncased_en.md new file mode 100644 index 00000000000000..5be2125b29c34d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_mitre_bert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_mitre_bert_base_uncased BertSentenceEmbeddings from bencyc1129 +author: John Snow Labs +name: sent_mitre_bert_base_uncased +date: 2024-09-15 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mitre_bert_base_uncased` is a English model originally trained by bencyc1129. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mitre_bert_base_uncased_en_5.5.0_3.0_1726436745260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mitre_bert_base_uncased_en_5.5.0_3.0_1726436745260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_mitre_bert_base_uncased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_mitre_bert_base_uncased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mitre_bert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/bencyc1129/mitre-bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_30_2024_07_26_12_23_45_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_30_2024_07_26_12_23_45_pipeline_en.md new file mode 100644 index 00000000000000..72835b75b5da67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_30_2024_07_26_12_23_45_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_30_2024_07_26_12_23_45_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_30_2024_07_26_12_23_45_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_30_2024_07_26_12_23_45_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_30_2024_07_26_12_23_45_pipeline_en_5.5.0_3.0_1726406091021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_30_2024_07_26_12_23_45_pipeline_en_5.5.0_3.0_1726406091021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_30_2024_07_26_12_23_45_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_30_2024_07_26_12_23_45_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_30_2024_07_26_12_23_45_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-30-2024-07-26_12-23-45 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_51_start_exp_time_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_51_start_exp_time_pipeline_en.md new file mode 100644 index 00000000000000..9116da5603ba84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_51_start_exp_time_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_51_start_exp_time_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_51_start_exp_time_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_51_start_exp_time_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_51_start_exp_time_pipeline_en_5.5.0_3.0_1726405799187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_51_start_exp_time_pipeline_en_5.5.0_3.0_1726405799187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_51_start_exp_time_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_51_start_exp_time_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_51_start_exp_time_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-51-START_EXP_TIME + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-takalane_ssw_roberta_pipeline_tn.md b/docs/_posts/ahmedlone127/2024-09-15-takalane_ssw_roberta_pipeline_tn.md new file mode 100644 index 00000000000000..6882f84b48c52f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-takalane_ssw_roberta_pipeline_tn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Tswana takalane_ssw_roberta_pipeline pipeline RoBertaEmbeddings from jannesg +author: John Snow Labs +name: takalane_ssw_roberta_pipeline +date: 2024-09-15 +tags: [tn, open_source, pipeline, onnx] +task: Embeddings +language: tn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`takalane_ssw_roberta_pipeline` is a Tswana model originally trained by jannesg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/takalane_ssw_roberta_pipeline_tn_5.5.0_3.0_1726413256828.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/takalane_ssw_roberta_pipeline_tn_5.5.0_3.0_1726413256828.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("takalane_ssw_roberta_pipeline", lang = "tn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("takalane_ssw_roberta_pipeline", lang = "tn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|takalane_ssw_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tn| +|Size:|310.2 MB| + +## References + +https://huggingface.co/jannesg/takalane_ssw_roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-tamil_eng_data_en.md b/docs/_posts/ahmedlone127/2024-09-15-tamil_eng_data_en.md new file mode 100644 index 00000000000000..2e18ea188f2fe9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-tamil_eng_data_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English tamil_eng_data WhisperForCTC from Achitha +author: John Snow Labs +name: tamil_eng_data +date: 2024-09-15 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tamil_eng_data` is a English model originally trained by Achitha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tamil_eng_data_en_5.5.0_3.0_1726421150135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tamil_eng_data_en_5.5.0_3.0_1726421150135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("tamil_eng_data","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("tamil_eng_data", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tamil_eng_data| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.6 MB| + +## References + +https://huggingface.co/Achitha/ta-eng-data \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-test_rrrrrrrita_en.md b/docs/_posts/ahmedlone127/2024-09-15-test_rrrrrrrita_en.md new file mode 100644 index 00000000000000..bce170fcc7b26e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-test_rrrrrrrita_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_rrrrrrrita DistilBertForSequenceClassification from Rrrrrrrita +author: John Snow Labs +name: test_rrrrrrrita +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_rrrrrrrita` is a English model originally trained by Rrrrrrrita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_rrrrrrrita_en_5.5.0_3.0_1726366500817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_rrrrrrrita_en_5.5.0_3.0_1726366500817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_rrrrrrrita","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_rrrrrrrita", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_rrrrrrrita| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Rrrrrrrita/test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-text_clf_model_v03_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-text_clf_model_v03_pipeline_en.md new file mode 100644 index 00000000000000..038d87a2d8d170 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-text_clf_model_v03_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_clf_model_v03_pipeline pipeline DistilBertForSequenceClassification from ogamart +author: John Snow Labs +name: text_clf_model_v03_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_clf_model_v03_pipeline` is a English model originally trained by ogamart. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_clf_model_v03_pipeline_en_5.5.0_3.0_1726393896804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_clf_model_v03_pipeline_en_5.5.0_3.0_1726393896804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_clf_model_v03_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_clf_model_v03_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_clf_model_v03_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ogamart/text-clf-model-v03 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-tuf_albert_5e_en.md b/docs/_posts/ahmedlone127/2024-09-15-tuf_albert_5e_en.md new file mode 100644 index 00000000000000..6563bbb06cf442 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-tuf_albert_5e_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tuf_albert_5e AlbertForSequenceClassification from pig4431 +author: John Snow Labs +name: tuf_albert_5e +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tuf_albert_5e` is a English model originally trained by pig4431. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tuf_albert_5e_en_5.5.0_3.0_1726372617334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tuf_albert_5e_en_5.5.0_3.0_1726372617334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("tuf_albert_5e","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("tuf_albert_5e", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tuf_albert_5e| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/pig4431/TUF_ALBERT_5E \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-twitter_roberta_base_sep2021_en.md b/docs/_posts/ahmedlone127/2024-09-15-twitter_roberta_base_sep2021_en.md new file mode 100644 index 00000000000000..8dd2d0e94367cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-twitter_roberta_base_sep2021_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_roberta_base_sep2021 RoBertaEmbeddings from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_sep2021 +date: 2024-09-15 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_sep2021` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_sep2021_en_5.5.0_3.0_1726383669099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_sep2021_en_5.5.0_3.0_1726383669099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("twitter_roberta_base_sep2021","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("twitter_roberta_base_sep2021","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_sep2021| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base-sep2021 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_base_cer_gn.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_base_cer_gn.md new file mode 100644 index 00000000000000..2ad0986ff1a1b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_base_cer_gn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Guarani whisper_base_cer WhisperForCTC from adrianSauer +author: John Snow Labs +name: whisper_base_cer +date: 2024-09-15 +tags: [gn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: gn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_cer` is a Guarani model originally trained by adrianSauer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_cer_gn_5.5.0_3.0_1726407224087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_cer_gn_5.5.0_3.0_1726407224087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_cer","gn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_cer", "gn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_cer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|gn| +|Size:|642.3 MB| + +## References + +https://huggingface.co/adrianSauer/whisper-base-cer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_amet_en.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_amet_en.md new file mode 100644 index 00000000000000..0728096d40cb95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_amet_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_amet WhisperForCTC from drmeeseeks +author: John Snow Labs +name: whisper_small_amet +date: 2024-09-15 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_amet` is a English model originally trained by drmeeseeks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_amet_en_5.5.0_3.0_1726419992221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_amet_en_5.5.0_3.0_1726419992221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_amet","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_amet", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_amet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/drmeeseeks/whisper-small-amet \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_chv_pre_cv.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_chv_pre_cv.md new file mode 100644 index 00000000000000..eb086bc0cfe79d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_chv_pre_cv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Chuvash whisper_small_chv_pre WhisperForCTC from alexantonov +author: John Snow Labs +name: whisper_small_chv_pre +date: 2024-09-15 +tags: [cv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: cv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chv_pre` is a Chuvash model originally trained by alexantonov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chv_pre_cv_5.5.0_3.0_1726390113446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chv_pre_cv_5.5.0_3.0_1726390113446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_chv_pre","cv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_chv_pre", "cv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chv_pre| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|cv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/alexantonov/whisper-small-chv-pre \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_korean_zeroth_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_korean_zeroth_pipeline_ko.md new file mode 100644 index 00000000000000..8eea6c46947c91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_korean_zeroth_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean whisper_small_korean_zeroth_pipeline pipeline WhisperForCTC from seastar105 +author: John Snow Labs +name: whisper_small_korean_zeroth_pipeline +date: 2024-09-15 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_korean_zeroth_pipeline` is a Korean model originally trained by seastar105. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_korean_zeroth_pipeline_ko_5.5.0_3.0_1726388415077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_korean_zeroth_pipeline_ko_5.5.0_3.0_1726388415077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_korean_zeroth_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_korean_zeroth_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_korean_zeroth_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/seastar105/whisper-small-ko-zeroth + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_nepali_jenrish_ne.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_nepali_jenrish_ne.md new file mode 100644 index 00000000000000..cbe8f0c2a498bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_nepali_jenrish_ne.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Nepali (macrolanguage) whisper_small_nepali_jenrish WhisperForCTC from jenrish +author: John Snow Labs +name: whisper_small_nepali_jenrish +date: 2024-09-15 +tags: [ne, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ne +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_nepali_jenrish` is a Nepali (macrolanguage) model originally trained by jenrish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_nepali_jenrish_ne_5.5.0_3.0_1726388496679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_nepali_jenrish_ne_5.5.0_3.0_1726388496679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_nepali_jenrish","ne") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_nepali_jenrish", "ne") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_nepali_jenrish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ne| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jenrish/whisper-small-ne \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_spanish_danielizham_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_spanish_danielizham_pipeline_es.md new file mode 100644 index 00000000000000..d5bbb14ee25e93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_spanish_danielizham_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish whisper_small_spanish_danielizham_pipeline pipeline WhisperForCTC from danielizham +author: John Snow Labs +name: whisper_small_spanish_danielizham_pipeline +date: 2024-09-15 +tags: [es, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_spanish_danielizham_pipeline` is a Castilian, Spanish model originally trained by danielizham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_danielizham_pipeline_es_5.5.0_3.0_1726390116451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_danielizham_pipeline_es_5.5.0_3.0_1726390116451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_spanish_danielizham_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_spanish_danielizham_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_spanish_danielizham_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.7 GB| + +## References + +https://huggingface.co/danielizham/whisper-small-es + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_swahili_hedronstone_sw.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_swahili_hedronstone_sw.md new file mode 100644 index 00000000000000..23a2ff4c7d0e90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_swahili_hedronstone_sw.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Swahili (macrolanguage) whisper_small_swahili_hedronstone WhisperForCTC from hedronstone +author: John Snow Labs +name: whisper_small_swahili_hedronstone +date: 2024-09-15 +tags: [sw, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sw +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swahili_hedronstone` is a Swahili (macrolanguage) model originally trained by hedronstone. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swahili_hedronstone_sw_5.5.0_3.0_1726416257816.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swahili_hedronstone_sw_5.5.0_3.0_1726416257816.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_swahili_hedronstone","sw") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_swahili_hedronstone", "sw") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swahili_hedronstone| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sw| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hedronstone/whisper-small-sw \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_swedish_northern_sami_irena_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_swedish_northern_sami_irena_pipeline_hi.md new file mode 100644 index 00000000000000..1621645593d669 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_swedish_northern_sami_irena_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_swedish_northern_sami_irena_pipeline pipeline WhisperForCTC from irena +author: John Snow Labs +name: whisper_small_swedish_northern_sami_irena_pipeline +date: 2024-09-15 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swedish_northern_sami_irena_pipeline` is a Hindi model originally trained by irena. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_northern_sami_irena_pipeline_hi_5.5.0_3.0_1726420424902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_northern_sami_irena_pipeline_hi_5.5.0_3.0_1726420424902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_swedish_northern_sami_irena_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_swedish_northern_sami_irena_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swedish_northern_sami_irena_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/irena/whisper-small-sv-SE + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_urdu_hadiqa123_en.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_urdu_hadiqa123_en.md new file mode 100644 index 00000000000000..3225b9551a2f25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_urdu_hadiqa123_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_urdu_hadiqa123 WhisperForCTC from hadiqa123 +author: John Snow Labs +name: whisper_small_urdu_hadiqa123 +date: 2024-09-15 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_urdu_hadiqa123` is a English model originally trained by hadiqa123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_urdu_hadiqa123_en_5.5.0_3.0_1726390603537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_urdu_hadiqa123_en_5.5.0_3.0_1726390603537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_urdu_hadiqa123","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_urdu_hadiqa123", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_urdu_hadiqa123| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hadiqa123/whisper-small-ur \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_urdu_howmannymore_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_urdu_howmannymore_pipeline_en.md new file mode 100644 index 00000000000000..7c76b739877f19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_urdu_howmannymore_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_urdu_howmannymore_pipeline pipeline WhisperForCTC from HowMannyMore +author: John Snow Labs +name: whisper_small_urdu_howmannymore_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_urdu_howmannymore_pipeline` is a English model originally trained by HowMannyMore. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_urdu_howmannymore_pipeline_en_5.5.0_3.0_1726419552844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_urdu_howmannymore_pipeline_en_5.5.0_3.0_1726419552844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_urdu_howmannymore_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_urdu_howmannymore_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_urdu_howmannymore_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/HowMannyMore/whisper-small-urdu + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_english_minds14_aryan_401_en.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_english_minds14_aryan_401_en.md new file mode 100644 index 00000000000000..9185bee07b2490 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_english_minds14_aryan_401_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_minds14_aryan_401 WhisperForCTC from Aryan-401 +author: John Snow Labs +name: whisper_tiny_english_minds14_aryan_401 +date: 2024-09-15 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_minds14_aryan_401` is a English model originally trained by Aryan-401. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_minds14_aryan_401_en_5.5.0_3.0_1726425621929.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_minds14_aryan_401_en_5.5.0_3.0_1726425621929.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_minds14_aryan_401","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_minds14_aryan_401", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_minds14_aryan_401| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/Aryan-401/whisper-tiny-english-minds14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_mongolian_cafet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_mongolian_cafet_pipeline_en.md new file mode 100644 index 00000000000000..4a6f5902dd24c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_mongolian_cafet_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_mongolian_cafet_pipeline pipeline WhisperForCTC from Cafet +author: John Snow Labs +name: whisper_tiny_mongolian_cafet_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_mongolian_cafet_pipeline` is a English model originally trained by Cafet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_mongolian_cafet_pipeline_en_5.5.0_3.0_1726425173849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_mongolian_cafet_pipeline_en_5.5.0_3.0_1726425173849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_mongolian_cafet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_mongolian_cafet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_mongolian_cafet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.2 MB| + +## References + +https://huggingface.co/Cafet/whisper-tiny-mongolian + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_norwegian_faroese_100h_5k_steps_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_norwegian_faroese_100h_5k_steps_v2_pipeline_en.md new file mode 100644 index 00000000000000..a9989046303403 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_norwegian_faroese_100h_5k_steps_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_norwegian_faroese_100h_5k_steps_v2_pipeline pipeline WhisperForCTC from davidilag +author: John Snow Labs +name: whisper_tiny_norwegian_faroese_100h_5k_steps_v2_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_norwegian_faroese_100h_5k_steps_v2_pipeline` is a English model originally trained by davidilag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_norwegian_faroese_100h_5k_steps_v2_pipeline_en_5.5.0_3.0_1726410252424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_norwegian_faroese_100h_5k_steps_v2_pipeline_en_5.5.0_3.0_1726410252424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_norwegian_faroese_100h_5k_steps_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_norwegian_faroese_100h_5k_steps_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_norwegian_faroese_100h_5k_steps_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|384.2 MB| + +## References + +https://huggingface.co/davidilag/whisper-tiny-no-fo-100h-5k-steps_v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_russian_machinetrofimov_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_russian_machinetrofimov_pipeline_ru.md new file mode 100644 index 00000000000000..79d436703bf72d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_russian_machinetrofimov_pipeline_ru.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Russian whisper_tiny_russian_machinetrofimov_pipeline pipeline WhisperForCTC from MachineTrofimov +author: John Snow Labs +name: whisper_tiny_russian_machinetrofimov_pipeline +date: 2024-09-15 +tags: [ru, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_russian_machinetrofimov_pipeline` is a Russian model originally trained by MachineTrofimov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_russian_machinetrofimov_pipeline_ru_5.5.0_3.0_1726421144770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_russian_machinetrofimov_pipeline_ru_5.5.0_3.0_1726421144770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_russian_machinetrofimov_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_russian_machinetrofimov_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_russian_machinetrofimov_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|390.1 MB| + +## References + +https://huggingface.co/MachineTrofimov/whisper-tiny-ru + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_thai_th.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_thai_th.md new file mode 100644 index 00000000000000..4e90c75684212c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_thai_th.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Thai whisper_tiny_thai WhisperForCTC from juierror +author: John Snow Labs +name: whisper_tiny_thai +date: 2024-09-15 +tags: [th, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: th +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_thai` is a Thai model originally trained by juierror. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_thai_th_5.5.0_3.0_1726411698736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_thai_th_5.5.0_3.0_1726411698736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_thai","th") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_thai", "th") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_thai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|th| +|Size:|390.4 MB| + +## References + +https://huggingface.co/juierror/whisper-tiny-thai \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_turkish_ckandemir_tr.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_turkish_ckandemir_tr.md new file mode 100644 index 00000000000000..f3ebdcec83aa3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_turkish_ckandemir_tr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Turkish whisper_tiny_turkish_ckandemir WhisperForCTC from ckandemir +author: John Snow Labs +name: whisper_tiny_turkish_ckandemir +date: 2024-09-15 +tags: [tr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_turkish_ckandemir` is a Turkish model originally trained by ckandemir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_turkish_ckandemir_tr_5.5.0_3.0_1726410222510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_turkish_ckandemir_tr_5.5.0_3.0_1726410222510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_turkish_ckandemir","tr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_turkish_ckandemir", "tr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_turkish_ckandemir| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|tr| +|Size:|389.3 MB| + +## References + +https://huggingface.co/ckandemir/whisper-tiny-tr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_v2_default_small_en.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_v2_default_small_en.md new file mode 100644 index 00000000000000..517ade4a374e62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_v2_default_small_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_v2_default_small WhisperForCTC from karinthommen +author: John Snow Labs +name: whisper_v2_default_small +date: 2024-09-15 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_v2_default_small` is a English model originally trained by karinthommen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_v2_default_small_en_5.5.0_3.0_1726430112927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_v2_default_small_en_5.5.0_3.0_1726430112927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_v2_default_small","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_v2_default_small", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_v2_default_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/karinthommen/whisper-V2-default-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-withinapps_ndd_dimeshift_test_tags_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-15-withinapps_ndd_dimeshift_test_tags_cwadj_en.md new file mode 100644 index 00000000000000..15e62caeafee28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-withinapps_ndd_dimeshift_test_tags_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_dimeshift_test_tags_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_dimeshift_test_tags_cwadj +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_dimeshift_test_tags_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_dimeshift_test_tags_cwadj_en_5.5.0_3.0_1726366396913.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_dimeshift_test_tags_cwadj_en_5.5.0_3.0_1726366396913.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_dimeshift_test_tags_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_dimeshift_test_tags_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_dimeshift_test_tags_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-dimeshift_test-tags-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_4_pipeline_en.md new file mode 100644 index 00000000000000..5c3a063b15ba80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_4_pipeline pipeline XlmRoBertaForSequenceClassification from alyazharr +author: John Snow Labs +name: xlm_roberta_base_4_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_4_pipeline` is a English model originally trained by alyazharr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_4_pipeline_en_5.5.0_3.0_1726374832675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_4_pipeline_en_5.5.0_3.0_1726374832675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.5 MB| + +## References + +https://huggingface.co/alyazharr/xlm_roberta_base_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinyarwanda_sent1_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinyarwanda_sent1_en.md new file mode 100644 index 00000000000000..ecb44aa43f3bc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinyarwanda_sent1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinyarwanda_sent1 XlmRoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinyarwanda_sent1 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinyarwanda_sent1` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinyarwanda_sent1_en_5.5.0_3.0_1726433486046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinyarwanda_sent1_en_5.5.0_3.0_1726433486046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinyarwanda_sent1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinyarwanda_sent1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinyarwanda_sent1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kin-finetuned-kin-sent1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_english_amitjain171980_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_english_amitjain171980_en.md new file mode 100644 index 00000000000000..3ee0a5ee2fdc50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_english_amitjain171980_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_amitjain171980 XlmRoBertaForTokenClassification from amitjain171980 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_amitjain171980 +date: 2024-09-15 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_amitjain171980` is a English model originally trained by amitjain171980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_amitjain171980_en_5.5.0_3.0_1726362739912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_amitjain171980_en_5.5.0_3.0_1726362739912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_amitjain171980","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_amitjain171980", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_amitjain171980| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/amitjain171980/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_english_monkdalma_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_english_monkdalma_en.md new file mode 100644 index 00000000000000..9e4ba0e29210a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_english_monkdalma_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_monkdalma XlmRoBertaForTokenClassification from MonkDalma +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_monkdalma +date: 2024-09-15 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_monkdalma` is a English model originally trained by MonkDalma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_monkdalma_en_5.5.0_3.0_1726370502086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_monkdalma_en_5.5.0_3.0_1726370502086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_monkdalma","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_monkdalma", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_monkdalma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/MonkDalma/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_french_abdus_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_french_abdus_en.md new file mode 100644 index 00000000000000..93338def89468d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_french_abdus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_abdus XlmRoBertaForTokenClassification from abdus +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_abdus +date: 2024-09-15 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_abdus` is a English model originally trained by abdus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_abdus_en_5.5.0_3.0_1726362999928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_abdus_en_5.5.0_3.0_1726362999928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_abdus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_abdus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_abdus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/abdus/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_french_haesun_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_french_haesun_en.md new file mode 100644 index 00000000000000..b6cd98a893a7ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_french_haesun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_haesun XlmRoBertaForTokenClassification from haesun +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_haesun +date: 2024-09-15 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_haesun` is a English model originally trained by haesun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_haesun_en_5.5.0_3.0_1726362157811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_haesun_en_5.5.0_3.0_1726362157811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_haesun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_haesun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_haesun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/haesun/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_french_udon3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_french_udon3_pipeline_en.md new file mode 100644 index 00000000000000..82a2314da20742 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_french_udon3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_udon3_pipeline pipeline XlmRoBertaForTokenClassification from udon3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_udon3_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_udon3_pipeline` is a English model originally trained by udon3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_udon3_pipeline_en_5.5.0_3.0_1726362320880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_udon3_pipeline_en_5.5.0_3.0_1726362320880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_udon3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_udon3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_udon3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/udon3/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_italian_ajit_transformer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_italian_ajit_transformer_pipeline_en.md new file mode 100644 index 00000000000000..46f2d8c983c782 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_italian_ajit_transformer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_ajit_transformer_pipeline pipeline XlmRoBertaForTokenClassification from ajit-transformer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_ajit_transformer_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_ajit_transformer_pipeline` is a English model originally trained by ajit-transformer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ajit_transformer_pipeline_en_5.5.0_3.0_1726361552849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ajit_transformer_pipeline_en_5.5.0_3.0_1726361552849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_ajit_transformer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_ajit_transformer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_ajit_transformer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.5 MB| + +## References + +https://huggingface.co/ajit-transformer/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_italian_ligerre_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_italian_ligerre_pipeline_en.md new file mode 100644 index 00000000000000..c8d8ce146fd3ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_italian_ligerre_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_ligerre_pipeline pipeline XlmRoBertaForTokenClassification from ligerre +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_ligerre_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_ligerre_pipeline` is a English model originally trained by ligerre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ligerre_pipeline_en_5.5.0_3.0_1726361536087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ligerre_pipeline_en_5.5.0_3.0_1726361536087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_ligerre_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_ligerre_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_ligerre_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/ligerre/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_italian_scionk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_italian_scionk_pipeline_en.md new file mode 100644 index 00000000000000..eec7fb29fe72a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_italian_scionk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_scionk_pipeline pipeline XlmRoBertaForTokenClassification from scionk +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_scionk_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_scionk_pipeline` is a English model originally trained by scionk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_scionk_pipeline_en_5.5.0_3.0_1726361566142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_scionk_pipeline_en_5.5.0_3.0_1726361566142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_scionk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_scionk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_scionk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/scionk/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_italian_ysige_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_italian_ysige_en.md new file mode 100644 index 00000000000000..716d15dde1dec0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_italian_ysige_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_ysige XlmRoBertaForTokenClassification from ysige +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_ysige +date: 2024-09-15 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_ysige` is a English model originally trained by ysige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ysige_en_5.5.0_3.0_1726362746599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ysige_en_5.5.0_3.0_1726362746599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_ysige","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_ysige", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_ysige| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/ysige/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_tweet_sentiment_english_trimmed_english_50000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_tweet_sentiment_english_trimmed_english_50000_pipeline_en.md new file mode 100644 index 00000000000000..b6fd038cc4239f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_tweet_sentiment_english_trimmed_english_50000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_tweet_sentiment_english_trimmed_english_50000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_tweet_sentiment_english_trimmed_english_50000_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_tweet_sentiment_english_trimmed_english_50000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_english_trimmed_english_50000_pipeline_en_5.5.0_3.0_1726432887672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_english_trimmed_english_50000_pipeline_en_5.5.0_3.0_1726432887672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_tweet_sentiment_english_trimmed_english_50000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_tweet_sentiment_english_trimmed_english_50000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_tweet_sentiment_english_trimmed_english_50000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-tweet-sentiment-en-trimmed-en-50000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_longformer_base_4096_repnum_wl_3_classes_fr.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_longformer_base_4096_repnum_wl_3_classes_fr.md new file mode 100644 index 00000000000000..ac92b0c82e46c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_longformer_base_4096_repnum_wl_3_classes_fr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: French xlm_roberta_longformer_base_4096_repnum_wl_3_classes XlmRoBertaForSequenceClassification from waboucay +author: John Snow Labs +name: xlm_roberta_longformer_base_4096_repnum_wl_3_classes +date: 2024-09-15 +tags: [fr, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_longformer_base_4096_repnum_wl_3_classes` is a French model originally trained by waboucay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_longformer_base_4096_repnum_wl_3_classes_fr_5.5.0_3.0_1726373499425.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_longformer_base_4096_repnum_wl_3_classes_fr_5.5.0_3.0_1726373499425.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_longformer_base_4096_repnum_wl_3_classes","fr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_longformer_base_4096_repnum_wl_3_classes", "fr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_longformer_base_4096_repnum_wl_3_classes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|fr| +|Size:|1.1 GB| + +## References + +https://huggingface.co/waboucay/xlm-roberta-longformer-base-4096-repnum_wl_3_classes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_ner_shopsign_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_ner_shopsign_pipeline_en.md new file mode 100644 index 00000000000000..b02dcfcc05584a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_ner_shopsign_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_ner_shopsign_pipeline pipeline XlmRoBertaForTokenClassification from HyungYoun +author: John Snow Labs +name: xlm_roberta_ner_shopsign_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_ner_shopsign_pipeline` is a English model originally trained by HyungYoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_ner_shopsign_pipeline_en_5.5.0_3.0_1726398436203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_ner_shopsign_pipeline_en_5.5.0_3.0_1726398436203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_ner_shopsign_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_ner_shopsign_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_ner_shopsign_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.2 MB| + +## References + +https://huggingface.co/HyungYoun/xlm-roberta-ner-shopsign + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlmr_romanian_english_all_shuffled_42_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlmr_romanian_english_all_shuffled_42_test1000_en.md new file mode 100644 index 00000000000000..e9db7026eea6c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlmr_romanian_english_all_shuffled_42_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_romanian_english_all_shuffled_42_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_romanian_english_all_shuffled_42_test1000 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_romanian_english_all_shuffled_42_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_romanian_english_all_shuffled_42_test1000_en_5.5.0_3.0_1726441192132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_romanian_english_all_shuffled_42_test1000_en_5.5.0_3.0_1726441192132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_romanian_english_all_shuffled_42_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_romanian_english_all_shuffled_42_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_romanian_english_all_shuffled_42_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|820.3 MB| + +## References + +https://huggingface.co/patpizio/xlmr-ro-en-all_shuffled-42-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlmrobertatask2_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlmrobertatask2_en.md new file mode 100644 index 00000000000000..8ed5fdf2c6816b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlmrobertatask2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmrobertatask2 XlmRoBertaForSequenceClassification from DianaTurmakhan +author: John Snow Labs +name: xlmrobertatask2 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmrobertatask2` is a English model originally trained by DianaTurmakhan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmrobertatask2_en_5.5.0_3.0_1726441857901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmrobertatask2_en_5.5.0_3.0_1726441857901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmrobertatask2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmrobertatask2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmrobertatask2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|819.3 MB| + +## References + +https://huggingface.co/DianaTurmakhan/XLMRobertaTask2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-afriberta_base_finetuned_igbo_en.md b/docs/_posts/ahmedlone127/2024-09-16-afriberta_base_finetuned_igbo_en.md new file mode 100644 index 00000000000000..930c74d30d1928 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-afriberta_base_finetuned_igbo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_base_finetuned_igbo XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_base_finetuned_igbo +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_base_finetuned_igbo` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_igbo_en_5.5.0_3.0_1726496367854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_igbo_en_5.5.0_3.0_1726496367854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_finetuned_igbo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_finetuned_igbo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_base_finetuned_igbo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|415.2 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-base-finetuned-igbo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-afro_xlmr_base_hausa_2e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-afro_xlmr_base_hausa_2e_5_pipeline_en.md new file mode 100644 index 00000000000000..385dfef0b8dd75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-afro_xlmr_base_hausa_2e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afro_xlmr_base_hausa_2e_5_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_base_hausa_2e_5_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base_hausa_2e_5_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_2e_5_pipeline_en_5.5.0_3.0_1726498094799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_2e_5_pipeline_en_5.5.0_3.0_1726498094799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afro_xlmr_base_hausa_2e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afro_xlmr_base_hausa_2e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base_hausa_2e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-base-hausa-2e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-ag_news_classification_distillbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-ag_news_classification_distillbert_pipeline_en.md new file mode 100644 index 00000000000000..c82a2d95a94c36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-ag_news_classification_distillbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ag_news_classification_distillbert_pipeline pipeline DistilBertForSequenceClassification from cornelliusyudhawijaya +author: John Snow Labs +name: ag_news_classification_distillbert_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ag_news_classification_distillbert_pipeline` is a English model originally trained by cornelliusyudhawijaya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ag_news_classification_distillbert_pipeline_en_5.5.0_3.0_1726506402085.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ag_news_classification_distillbert_pipeline_en_5.5.0_3.0_1726506402085.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ag_news_classification_distillbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ag_news_classification_distillbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ag_news_classification_distillbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cornelliusyudhawijaya/AG_News_Classification_DistillBert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-ag_news_roberta_large_seed_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-ag_news_roberta_large_seed_1_pipeline_en.md new file mode 100644 index 00000000000000..96cab52e0ca29b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-ag_news_roberta_large_seed_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ag_news_roberta_large_seed_1_pipeline pipeline RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: ag_news_roberta_large_seed_1_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ag_news_roberta_large_seed_1_pipeline` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ag_news_roberta_large_seed_1_pipeline_en_5.5.0_3.0_1726518608139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ag_news_roberta_large_seed_1_pipeline_en_5.5.0_3.0_1726518608139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ag_news_roberta_large_seed_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ag_news_roberta_large_seed_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ag_news_roberta_large_seed_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/utahnlp/ag_news_roberta-large_seed-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-ahisto_ner_model_tds1_witiko_en.md b/docs/_posts/ahmedlone127/2024-09-16-ahisto_ner_model_tds1_witiko_en.md new file mode 100644 index 00000000000000..f4e34be853046c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-ahisto_ner_model_tds1_witiko_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ahisto_ner_model_tds1_witiko XlmRoBertaForTokenClassification from witiko +author: John Snow Labs +name: ahisto_ner_model_tds1_witiko +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ahisto_ner_model_tds1_witiko` is a English model originally trained by witiko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ahisto_ner_model_tds1_witiko_en_5.5.0_3.0_1726494927524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ahisto_ner_model_tds1_witiko_en_5.5.0_3.0_1726494927524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ahisto_ner_model_tds1_witiko","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ahisto_ner_model_tds1_witiko", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ahisto_ner_model_tds1_witiko| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/witiko/ahisto-ner-model-tds1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-ahisto_ner_model_tds1_witiko_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-ahisto_ner_model_tds1_witiko_pipeline_en.md new file mode 100644 index 00000000000000..a2114651445df2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-ahisto_ner_model_tds1_witiko_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ahisto_ner_model_tds1_witiko_pipeline pipeline XlmRoBertaForTokenClassification from witiko +author: John Snow Labs +name: ahisto_ner_model_tds1_witiko_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ahisto_ner_model_tds1_witiko_pipeline` is a English model originally trained by witiko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ahisto_ner_model_tds1_witiko_pipeline_en_5.5.0_3.0_1726494978234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ahisto_ner_model_tds1_witiko_pipeline_en_5.5.0_3.0_1726494978234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ahisto_ner_model_tds1_witiko_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ahisto_ner_model_tds1_witiko_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ahisto_ner_model_tds1_witiko_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/witiko/ahisto-ner-model-tds1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-ai_text_detector_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-ai_text_detector_pipeline_en.md new file mode 100644 index 00000000000000..4b4e082ce61e06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-ai_text_detector_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ai_text_detector_pipeline pipeline BertForSequenceClassification from yongchao +author: John Snow Labs +name: ai_text_detector_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_text_detector_pipeline` is a English model originally trained by yongchao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_text_detector_pipeline_en_5.5.0_3.0_1726493352102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_text_detector_pipeline_en_5.5.0_3.0_1726493352102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ai_text_detector_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ai_text_detector_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_text_detector_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yongchao/ai_text_detector + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-akan_tiny_2000ms_1k_pipeline_ak.md b/docs/_posts/ahmedlone127/2024-09-16-akan_tiny_2000ms_1k_pipeline_ak.md new file mode 100644 index 00000000000000..f6886a8daa0324 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-akan_tiny_2000ms_1k_pipeline_ak.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Akan akan_tiny_2000ms_1k_pipeline pipeline WhisperForCTC from devkyle +author: John Snow Labs +name: akan_tiny_2000ms_1k_pipeline +date: 2024-09-16 +tags: [ak, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ak +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`akan_tiny_2000ms_1k_pipeline` is a Akan model originally trained by devkyle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/akan_tiny_2000ms_1k_pipeline_ak_5.5.0_3.0_1726476566938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/akan_tiny_2000ms_1k_pipeline_ak_5.5.0_3.0_1726476566938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("akan_tiny_2000ms_1k_pipeline", lang = "ak") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("akan_tiny_2000ms_1k_pipeline", lang = "ak") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|akan_tiny_2000ms_1k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ak| +|Size:|389.6 MB| + +## References + +https://huggingface.co/devkyle/Akan-tiny-2000ms-1k + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-albert_model_03_en.md b/docs/_posts/ahmedlone127/2024-09-16-albert_model_03_en.md new file mode 100644 index 00000000000000..baa3a446896e90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-albert_model_03_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_model_03 DistilBertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: albert_model_03 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_model_03` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_model_03_en_5.5.0_3.0_1726506510795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_model_03_en_5.5.0_3.0_1726506510795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("albert_model_03","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("albert_model_03", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_model_03| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KalaiselvanD/albert_model_03 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-albert_small_kor_cross_encoder_v1_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-16-albert_small_kor_cross_encoder_v1_pipeline_ko.md new file mode 100644 index 00000000000000..7065e152c25fdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-albert_small_kor_cross_encoder_v1_pipeline_ko.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Korean albert_small_kor_cross_encoder_v1_pipeline pipeline BertForSequenceClassification from bongsoo +author: John Snow Labs +name: albert_small_kor_cross_encoder_v1_pipeline +date: 2024-09-16 +tags: [ko, open_source, pipeline, onnx] +task: Text Classification +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_small_kor_cross_encoder_v1_pipeline` is a Korean model originally trained by bongsoo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_small_kor_cross_encoder_v1_pipeline_ko_5.5.0_3.0_1726459196662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_small_kor_cross_encoder_v1_pipeline_ko_5.5.0_3.0_1726459196662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_small_kor_cross_encoder_v1_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_small_kor_cross_encoder_v1_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_small_kor_cross_encoder_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|43.9 MB| + +## References + +https://huggingface.co/bongsoo/albert-small-kor-cross-encoder-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_auto_and_commute_8_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_auto_and_commute_8_16_5_oos_en.md new file mode 100644 index 00000000000000..b754778eec5fdf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_auto_and_commute_8_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_auto_and_commute_8_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_auto_and_commute_8_16_5_oos +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_auto_and_commute_8_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_auto_and_commute_8_16_5_oos_en_5.5.0_3.0_1726504671739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_auto_and_commute_8_16_5_oos_en_5.5.0_3.0_1726504671739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_auto_and_commute_8_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_auto_and_commute_8_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_auto_and_commute_8_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-auto_and_commute-8-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_auto_and_commute_8_16_5_oos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_auto_and_commute_8_16_5_oos_pipeline_en.md new file mode 100644 index 00000000000000..9172a396451903 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_auto_and_commute_8_16_5_oos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_auto_and_commute_8_16_5_oos_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_auto_and_commute_8_16_5_oos_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_auto_and_commute_8_16_5_oos_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_auto_and_commute_8_16_5_oos_pipeline_en_5.5.0_3.0_1726504732007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_auto_and_commute_8_16_5_oos_pipeline_en_5.5.0_3.0_1726504732007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_auto_and_commute_8_16_5_oos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_auto_and_commute_8_16_5_oos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_auto_and_commute_8_16_5_oos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-auto_and_commute-8-16-5-oos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_home_9_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_home_9_16_5_en.md new file mode 100644 index 00000000000000..d353d68641e7c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_home_9_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_home_9_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_home_9_16_5 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_home_9_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_9_16_5_en_5.5.0_3.0_1726527420310.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_9_16_5_en_5.5.0_3.0_1726527420310.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_home_9_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_home_9_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_home_9_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-home-9-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_home_9_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_home_9_16_5_pipeline_en.md new file mode 100644 index 00000000000000..5b9b37d55278a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_home_9_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_home_9_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_home_9_16_5_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_home_9_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_9_16_5_pipeline_en_5.5.0_3.0_1726527484833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_9_16_5_pipeline_en_5.5.0_3.0_1726527484833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_home_9_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_home_9_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_home_9_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-home-9-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-alvaro_marian_finetuned_italian_russian_en.md b/docs/_posts/ahmedlone127/2024-09-16-alvaro_marian_finetuned_italian_russian_en.md new file mode 100644 index 00000000000000..fc30550a4a8428 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-alvaro_marian_finetuned_italian_russian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English alvaro_marian_finetuned_italian_russian MarianTransformer from Rooshan +author: John Snow Labs +name: alvaro_marian_finetuned_italian_russian +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alvaro_marian_finetuned_italian_russian` is a English model originally trained by Rooshan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alvaro_marian_finetuned_italian_russian_en_5.5.0_3.0_1726493876879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alvaro_marian_finetuned_italian_russian_en_5.5.0_3.0_1726493876879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("alvaro_marian_finetuned_italian_russian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("alvaro_marian_finetuned_italian_russian","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alvaro_marian_finetuned_italian_russian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|1.4 GB| + +## References + +https://huggingface.co/Rooshan/Alvaro-marian_finetuned_it_ru \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_en.md b/docs/_posts/ahmedlone127/2024-09-16-analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_en.md new file mode 100644 index 00000000000000..0ed40d0774f68a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris RoBertaForSequenceClassification from SampsonChris +author: John Snow Labs +name: analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris` is a English model originally trained by SampsonChris. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_en_5.5.0_3.0_1726518739137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_en_5.5.0_3.0_1726518739137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/SampsonChris/Analysis_on_socialmedia_sentiment_on_vaccines \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_pipeline_en.md new file mode 100644 index 00000000000000..94585894421119 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_pipeline pipeline RoBertaForSequenceClassification from SampsonChris +author: John Snow Labs +name: analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_pipeline` is a English model originally trained by SampsonChris. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_pipeline_en_5.5.0_3.0_1726518761383.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_pipeline_en_5.5.0_3.0_1726518761383.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|analysis_on_socialmedia_sentiment_on_vaccines_sampsonchris_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/SampsonChris/Analysis_on_socialmedia_sentiment_on_vaccines + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-ascle_english_french_ufal_marianmt_en.md b/docs/_posts/ahmedlone127/2024-09-16-ascle_english_french_ufal_marianmt_en.md new file mode 100644 index 00000000000000..53f28cb5520155 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-ascle_english_french_ufal_marianmt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ascle_english_french_ufal_marianmt MarianTransformer from li-lab +author: John Snow Labs +name: ascle_english_french_ufal_marianmt +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ascle_english_french_ufal_marianmt` is a English model originally trained by li-lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ascle_english_french_ufal_marianmt_en_5.5.0_3.0_1726493997215.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ascle_english_french_ufal_marianmt_en_5.5.0_3.0_1726493997215.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("ascle_english_french_ufal_marianmt","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("ascle_english_french_ufal_marianmt","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ascle_english_french_ufal_marianmt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|506.8 MB| + +## References + +https://huggingface.co/li-lab/ascle-en-fr-UFAL-MarianMT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-autotrain_citizen_nlu_bengali_1370652766_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-16-autotrain_citizen_nlu_bengali_1370652766_pipeline_bn.md new file mode 100644 index 00000000000000..738c131017f923 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-autotrain_citizen_nlu_bengali_1370652766_pipeline_bn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Bengali autotrain_citizen_nlu_bengali_1370652766_pipeline pipeline RoBertaForSequenceClassification from neuralspace +author: John Snow Labs +name: autotrain_citizen_nlu_bengali_1370652766_pipeline +date: 2024-09-16 +tags: [bn, open_source, pipeline, onnx] +task: Text Classification +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_citizen_nlu_bengali_1370652766_pipeline` is a Bengali model originally trained by neuralspace. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_citizen_nlu_bengali_1370652766_pipeline_bn_5.5.0_3.0_1726517973445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_citizen_nlu_bengali_1370652766_pipeline_bn_5.5.0_3.0_1726517973445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_citizen_nlu_bengali_1370652766_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_citizen_nlu_bengali_1370652766_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_citizen_nlu_bengali_1370652766_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|312.0 MB| + +## References + +https://huggingface.co/neuralspace/autotrain-citizen_nlu_bn-1370652766 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-banglabert_en.md b/docs/_posts/ahmedlone127/2024-09-16-banglabert_en.md new file mode 100644 index 00000000000000..038798a81eeabf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-banglabert_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English banglabert BertForQuestionAnswering from doerig +author: John Snow Labs +name: banglabert +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`banglabert` is a English model originally trained by doerig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/banglabert_en_5.5.0_3.0_1726489994273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/banglabert_en_5.5.0_3.0_1726489994273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("banglabert","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("banglabert", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|banglabert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|412.2 MB| + +## References + +https://huggingface.co/doerig/banglabert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-banglabert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-banglabert_pipeline_en.md new file mode 100644 index 00000000000000..666301776281e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-banglabert_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English banglabert_pipeline pipeline BertForQuestionAnswering from doerig +author: John Snow Labs +name: banglabert_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`banglabert_pipeline` is a English model originally trained by doerig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/banglabert_pipeline_en_5.5.0_3.0_1726490014346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/banglabert_pipeline_en_5.5.0_3.0_1726490014346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("banglabert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("banglabert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|banglabert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.2 MB| + +## References + +https://huggingface.co/doerig/banglabert + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_base_cased_finetuned_alban12_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_base_cased_finetuned_alban12_en.md new file mode 100644 index 00000000000000..527e8d087a8db1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_base_cased_finetuned_alban12_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_finetuned_alban12 BertForTokenClassification from alban12 +author: John Snow Labs +name: bert_base_cased_finetuned_alban12 +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_alban12` is a English model originally trained by alban12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_alban12_en_5.5.0_3.0_1726520191320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_alban12_en_5.5.0_3.0_1726520191320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_cased_finetuned_alban12","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_cased_finetuned_alban12", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_alban12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|420.3 MB| + +## References + +https://huggingface.co/alban12/bert-base-cased-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_base_cased_finetuned_multinerd_systema_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_base_cased_finetuned_multinerd_systema_en.md new file mode 100644 index 00000000000000..4d0d2f0385d9f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_base_cased_finetuned_multinerd_systema_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_finetuned_multinerd_systema BertForTokenClassification from medxiaorudan +author: John Snow Labs +name: bert_base_cased_finetuned_multinerd_systema +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_multinerd_systema` is a English model originally trained by medxiaorudan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_multinerd_systema_en_5.5.0_3.0_1726520339742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_multinerd_systema_en_5.5.0_3.0_1726520339742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_cased_finetuned_multinerd_systema","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_cased_finetuned_multinerd_systema", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_multinerd_systema| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/medxiaorudan/bert-base-cased-finetuned-MultiNERD-SystemA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_base_cased_finetuned_multinerd_systema_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_base_cased_finetuned_multinerd_systema_pipeline_en.md new file mode 100644 index 00000000000000..e57e53819a92cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_base_cased_finetuned_multinerd_systema_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_finetuned_multinerd_systema_pipeline pipeline BertForTokenClassification from medxiaorudan +author: John Snow Labs +name: bert_base_cased_finetuned_multinerd_systema_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_multinerd_systema_pipeline` is a English model originally trained by medxiaorudan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_multinerd_systema_pipeline_en_5.5.0_3.0_1726520358817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_multinerd_systema_pipeline_en_5.5.0_3.0_1726520358817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_finetuned_multinerd_systema_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_finetuned_multinerd_systema_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_multinerd_systema_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.8 MB| + +## References + +https://huggingface.co/medxiaorudan/bert-base-cased-finetuned-MultiNERD-SystemA + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_base_german_cased_cimt_location_de.md b/docs/_posts/ahmedlone127/2024-09-16-bert_base_german_cased_cimt_location_de.md new file mode 100644 index 00000000000000..ad297a707a8cf7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_base_german_cased_cimt_location_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German bert_base_german_cased_cimt_location BertForTokenClassification from juliaromberg +author: John Snow Labs +name: bert_base_german_cased_cimt_location +date: 2024-09-16 +tags: [de, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_german_cased_cimt_location` is a German model originally trained by juliaromberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_cimt_location_de_5.5.0_3.0_1726454365687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_cimt_location_de_5.5.0_3.0_1726454365687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_german_cased_cimt_location","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_german_cased_cimt_location", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_german_cased_cimt_location| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|406.9 MB| + +## References + +https://huggingface.co/juliaromberg/bert-base-german-cased_cimt-location \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_base_tweet_topic_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_base_tweet_topic_classification_pipeline_en.md new file mode 100644 index 00000000000000..f6ea6ca55a4d03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_base_tweet_topic_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_tweet_topic_classification_pipeline pipeline BertForSequenceClassification from GeeDino +author: John Snow Labs +name: bert_base_tweet_topic_classification_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_tweet_topic_classification_pipeline` is a English model originally trained by GeeDino. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_tweet_topic_classification_pipeline_en_5.5.0_3.0_1726499181384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_tweet_topic_classification_pipeline_en_5.5.0_3.0_1726499181384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_tweet_topic_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_tweet_topic_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_tweet_topic_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|627.8 MB| + +## References + +https://huggingface.co/GeeDino/bert-base-tweet-topic-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_base_uncased_squadv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_base_uncased_squadv2_pipeline_en.md new file mode 100644 index 00000000000000..cfa22b9b281077 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_base_uncased_squadv2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_squadv2_pipeline pipeline BertForQuestionAnswering from Pennywise881 +author: John Snow Labs +name: bert_base_uncased_squadv2_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_squadv2_pipeline` is a English model originally trained by Pennywise881. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squadv2_pipeline_en_5.5.0_3.0_1726511101243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squadv2_pipeline_en_5.5.0_3.0_1726511101243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_squadv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_squadv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_squadv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Pennywise881/bert-base-uncased-squadv2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_book_genre_classification_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_book_genre_classification_en.md new file mode 100644 index 00000000000000..f8410b3a57f804 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_book_genre_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_book_genre_classification BertForSequenceClassification from statworx +author: John Snow Labs +name: bert_book_genre_classification +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_book_genre_classification` is a English model originally trained by statworx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_book_genre_classification_en_5.5.0_3.0_1726492846408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_book_genre_classification_en_5.5.0_3.0_1726492846408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_book_genre_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_book_genre_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_book_genre_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/statworx/bert-book-genre-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_cbisi_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_cbisi_en.md new file mode 100644 index 00000000000000..8f9d1dee2943a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_cbisi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_cbisi BertForSequenceClassification from AndersB3334 +author: John Snow Labs +name: bert_cbisi +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_cbisi` is a English model originally trained by AndersB3334. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_cbisi_en_5.5.0_3.0_1726498625045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_cbisi_en_5.5.0_3.0_1726498625045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_cbisi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_cbisi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_cbisi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/AndersB3334/BERT-CBISI \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_cbisi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_cbisi_pipeline_en.md new file mode 100644 index 00000000000000..acbcefac6bce08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_cbisi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_cbisi_pipeline pipeline BertForSequenceClassification from AndersB3334 +author: John Snow Labs +name: bert_cbisi_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_cbisi_pipeline` is a English model originally trained by AndersB3334. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_cbisi_pipeline_en_5.5.0_3.0_1726498645665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_cbisi_pipeline_en_5.5.0_3.0_1726498645665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_cbisi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_cbisi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_cbisi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/AndersB3334/BERT-CBISI + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_leg_al_corpus_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_leg_al_corpus_en.md new file mode 100644 index 00000000000000..57cbd5f3e53f9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_leg_al_corpus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_leg_al_corpus RoBertaEmbeddings from desarrolloasesoreslocales +author: John Snow Labs +name: bert_leg_al_corpus +date: 2024-09-16 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_leg_al_corpus` is a English model originally trained by desarrolloasesoreslocales. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_leg_al_corpus_en_5.5.0_3.0_1726513984974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_leg_al_corpus_en_5.5.0_3.0_1726513984974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bert_leg_al_corpus","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bert_leg_al_corpus","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_leg_al_corpus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|692.2 MB| + +## References + +https://huggingface.co/desarrolloasesoreslocales/bert-leg-al-corpus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_phishing_text_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_phishing_text_en.md new file mode 100644 index 00000000000000..3c6b8b08f067de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_phishing_text_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_phishing_text BertForSequenceClassification from ealvaradob +author: John Snow Labs +name: bert_phishing_text +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_phishing_text` is a English model originally trained by ealvaradob. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_phishing_text_en_5.5.0_3.0_1726493439468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_phishing_text_en_5.5.0_3.0_1726493439468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_phishing_text","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_phishing_text", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_phishing_text| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ealvaradob/bert-phishing-text \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_phishing_text_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_phishing_text_pipeline_en.md new file mode 100644 index 00000000000000..5714ab38312135 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_phishing_text_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_phishing_text_pipeline pipeline BertForSequenceClassification from ealvaradob +author: John Snow Labs +name: bert_phishing_text_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_phishing_text_pipeline` is a English model originally trained by ealvaradob. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_phishing_text_pipeline_en_5.5.0_3.0_1726493498671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_phishing_text_pipeline_en_5.5.0_3.0_1726493498671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_phishing_text_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_phishing_text_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_phishing_text_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ealvaradob/bert-phishing-text + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-berturk_128k_cased_immigration_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-berturk_128k_cased_immigration_pipeline_en.md new file mode 100644 index 00000000000000..0aaca252339049 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-berturk_128k_cased_immigration_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English berturk_128k_cased_immigration_pipeline pipeline BertForSequenceClassification from AlkanCan +author: John Snow Labs +name: berturk_128k_cased_immigration_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`berturk_128k_cased_immigration_pipeline` is a English model originally trained by AlkanCan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/berturk_128k_cased_immigration_pipeline_en_5.5.0_3.0_1726462875433.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/berturk_128k_cased_immigration_pipeline_en_5.5.0_3.0_1726462875433.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("berturk_128k_cased_immigration_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("berturk_128k_cased_immigration_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|berturk_128k_cased_immigration_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|691.2 MB| + +## References + +https://huggingface.co/AlkanCan/BERTurk-128k-cased-immigration + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bertweet_large_augmented_en.md b/docs/_posts/ahmedlone127/2024-09-16-bertweet_large_augmented_en.md new file mode 100644 index 00000000000000..ad544ec242febf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bertweet_large_augmented_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bertweet_large_augmented RoBertaForSequenceClassification from dvruette +author: John Snow Labs +name: bertweet_large_augmented +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertweet_large_augmented` is a English model originally trained by dvruette. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertweet_large_augmented_en_5.5.0_3.0_1726527694729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertweet_large_augmented_en_5.5.0_3.0_1726527694729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("bertweet_large_augmented","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("bertweet_large_augmented", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertweet_large_augmented| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dvruette/bertweet-large_augmented \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bertweet_large_augmented_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-bertweet_large_augmented_pipeline_en.md new file mode 100644 index 00000000000000..f3e334faa65b05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bertweet_large_augmented_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertweet_large_augmented_pipeline pipeline RoBertaForSequenceClassification from dvruette +author: John Snow Labs +name: bertweet_large_augmented_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertweet_large_augmented_pipeline` is a English model originally trained by dvruette. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertweet_large_augmented_pipeline_en_5.5.0_3.0_1726527759609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertweet_large_augmented_pipeline_en_5.5.0_3.0_1726527759609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertweet_large_augmented_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertweet_large_augmented_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertweet_large_augmented_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dvruette/bertweet-large_augmented + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bertweet_large_epoch6_batch4_lr2e_05_w0_005_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-bertweet_large_epoch6_batch4_lr2e_05_w0_005_pipeline_en.md new file mode 100644 index 00000000000000..11408c1902bf82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bertweet_large_epoch6_batch4_lr2e_05_w0_005_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertweet_large_epoch6_batch4_lr2e_05_w0_005_pipeline pipeline RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: bertweet_large_epoch6_batch4_lr2e_05_w0_005_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertweet_large_epoch6_batch4_lr2e_05_w0_005_pipeline` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertweet_large_epoch6_batch4_lr2e_05_w0_005_pipeline_en_5.5.0_3.0_1726470327542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertweet_large_epoch6_batch4_lr2e_05_w0_005_pipeline_en_5.5.0_3.0_1726470327542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertweet_large_epoch6_batch4_lr2e_05_w0_005_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertweet_large_epoch6_batch4_lr2e_05_w0_005_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertweet_large_epoch6_batch4_lr2e_05_w0_005_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JerryYanJiang/bertweet-large_epoch6_batch4_lr2e-05_w0.005 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bio_clinicalbert_for_seizurefreedom_classification_cnt_upenn_en.md b/docs/_posts/ahmedlone127/2024-09-16-bio_clinicalbert_for_seizurefreedom_classification_cnt_upenn_en.md new file mode 100644 index 00000000000000..151a07c03c44f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bio_clinicalbert_for_seizurefreedom_classification_cnt_upenn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bio_clinicalbert_for_seizurefreedom_classification_cnt_upenn BertForSequenceClassification from CNT-UPenn +author: John Snow Labs +name: bio_clinicalbert_for_seizurefreedom_classification_cnt_upenn +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bio_clinicalbert_for_seizurefreedom_classification_cnt_upenn` is a English model originally trained by CNT-UPenn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bio_clinicalbert_for_seizurefreedom_classification_cnt_upenn_en_5.5.0_3.0_1726459913778.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bio_clinicalbert_for_seizurefreedom_classification_cnt_upenn_en_5.5.0_3.0_1726459913778.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bio_clinicalbert_for_seizurefreedom_classification_cnt_upenn","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bio_clinicalbert_for_seizurefreedom_classification_cnt_upenn", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bio_clinicalbert_for_seizurefreedom_classification_cnt_upenn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.3 MB| + +## References + +https://huggingface.co/CNT-UPenn/Bio_ClinicalBERT_for_seizureFreedom_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-biobert_qa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-biobert_qa_pipeline_en.md new file mode 100644 index 00000000000000..e35ea97ee6919c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-biobert_qa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English biobert_qa_pipeline pipeline BertForQuestionAnswering from hagara +author: John Snow Labs +name: biobert_qa_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_qa_pipeline` is a English model originally trained by hagara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_qa_pipeline_en_5.5.0_3.0_1726511079245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_qa_pipeline_en_5.5.0_3.0_1726511079245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("biobert_qa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("biobert_qa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_qa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hagara/biobert-qa + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-biolinkbert_mednli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-biolinkbert_mednli_pipeline_en.md new file mode 100644 index 00000000000000..79a490ad1ab593 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-biolinkbert_mednli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English biolinkbert_mednli_pipeline pipeline BertForSequenceClassification from cnut1648 +author: John Snow Labs +name: biolinkbert_mednli_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biolinkbert_mednli_pipeline` is a English model originally trained by cnut1648. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biolinkbert_mednli_pipeline_en_5.5.0_3.0_1726459854969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biolinkbert_mednli_pipeline_en_5.5.0_3.0_1726459854969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("biolinkbert_mednli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("biolinkbert_mednli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biolinkbert_mednli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/cnut1648/biolinkbert-mednli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-biomedroberta_finetuned_valid_testing_0_00005_32_en.md b/docs/_posts/ahmedlone127/2024-09-16-biomedroberta_finetuned_valid_testing_0_00005_32_en.md new file mode 100644 index 00000000000000..8f35d4eb624ed5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-biomedroberta_finetuned_valid_testing_0_00005_32_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English biomedroberta_finetuned_valid_testing_0_00005_32 RoBertaForTokenClassification from pabRomero +author: John Snow Labs +name: biomedroberta_finetuned_valid_testing_0_00005_32 +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biomedroberta_finetuned_valid_testing_0_00005_32` is a English model originally trained by pabRomero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biomedroberta_finetuned_valid_testing_0_00005_32_en_5.5.0_3.0_1726452456854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biomedroberta_finetuned_valid_testing_0_00005_32_en_5.5.0_3.0_1726452456854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("biomedroberta_finetuned_valid_testing_0_00005_32","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("biomedroberta_finetuned_valid_testing_0_00005_32", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biomedroberta_finetuned_valid_testing_0_00005_32| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/pabRomero/BioMedRoBERTa-finetuned-valid-testing-0.00005-32 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-blueteam_baseline_en.md b/docs/_posts/ahmedlone127/2024-09-16-blueteam_baseline_en.md new file mode 100644 index 00000000000000..107e82bf2e8db4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-blueteam_baseline_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English blueteam_baseline DistilBertForSequenceClassification from sbhatti2009 +author: John Snow Labs +name: blueteam_baseline +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`blueteam_baseline` is a English model originally trained by sbhatti2009. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/blueteam_baseline_en_5.5.0_3.0_1726506123905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/blueteam_baseline_en_5.5.0_3.0_1726506123905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("blueteam_baseline","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("blueteam_baseline", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|blueteam_baseline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/sbhatti2009/blueteam-baseline \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bmg_translation_lug_english_v4_en.md b/docs/_posts/ahmedlone127/2024-09-16-bmg_translation_lug_english_v4_en.md new file mode 100644 index 00000000000000..fbc16c7c81cded --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bmg_translation_lug_english_v4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bmg_translation_lug_english_v4 MarianTransformer from atwine +author: John Snow Labs +name: bmg_translation_lug_english_v4 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bmg_translation_lug_english_v4` is a English model originally trained by atwine. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bmg_translation_lug_english_v4_en_5.5.0_3.0_1726494191859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bmg_translation_lug_english_v4_en_5.5.0_3.0_1726494191859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("bmg_translation_lug_english_v4","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("bmg_translation_lug_english_v4","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bmg_translation_lug_english_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|513.4 MB| + +## References + +https://huggingface.co/atwine/bmg-translation-lug-en-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-boolq_paws_en1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-boolq_paws_en1000_pipeline_en.md new file mode 100644 index 00000000000000..4519112cb96eaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-boolq_paws_en1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English boolq_paws_en1000_pipeline pipeline RoBertaForSequenceClassification from yeyejmm +author: John Snow Labs +name: boolq_paws_en1000_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`boolq_paws_en1000_pipeline` is a English model originally trained by yeyejmm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/boolq_paws_en1000_pipeline_en_5.5.0_3.0_1726527833489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/boolq_paws_en1000_pipeline_en_5.5.0_3.0_1726527833489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("boolq_paws_en1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("boolq_paws_en1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|boolq_paws_en1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|460.1 MB| + +## References + +https://huggingface.co/yeyejmm/BoolQ-PAWS-en1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bsc_bio_ehr_spanish_distemist_iic_es.md b/docs/_posts/ahmedlone127/2024-09-16-bsc_bio_ehr_spanish_distemist_iic_es.md new file mode 100644 index 00000000000000..095329b2903178 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bsc_bio_ehr_spanish_distemist_iic_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_distemist_iic RoBertaForTokenClassification from IIC +author: John Snow Labs +name: bsc_bio_ehr_spanish_distemist_iic +date: 2024-09-16 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_distemist_iic` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_distemist_iic_es_5.5.0_3.0_1726482353743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_distemist_iic_es_5.5.0_3.0_1726482353743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_distemist_iic","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_distemist_iic", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_distemist_iic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|432.7 MB| + +## References + +https://huggingface.co/IIC/bsc-bio-ehr-es-distemist \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bsc_bio_ehr_spanish_distemist_iic_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-16-bsc_bio_ehr_spanish_distemist_iic_pipeline_es.md new file mode 100644 index 00000000000000..a9a79b1d9323c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bsc_bio_ehr_spanish_distemist_iic_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_distemist_iic_pipeline pipeline RoBertaForTokenClassification from IIC +author: John Snow Labs +name: bsc_bio_ehr_spanish_distemist_iic_pipeline +date: 2024-09-16 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_distemist_iic_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_distemist_iic_pipeline_es_5.5.0_3.0_1726482386975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_distemist_iic_pipeline_es_5.5.0_3.0_1726482386975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_distemist_iic_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_distemist_iic_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_distemist_iic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|432.7 MB| + +## References + +https://huggingface.co/IIC/bsc-bio-ehr-es-distemist + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bsc_bio_ehr_spanish_meddocan_iic_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-16-bsc_bio_ehr_spanish_meddocan_iic_pipeline_es.md new file mode 100644 index 00000000000000..a4bde7b4eea377 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bsc_bio_ehr_spanish_meddocan_iic_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_meddocan_iic_pipeline pipeline RoBertaForTokenClassification from IIC +author: John Snow Labs +name: bsc_bio_ehr_spanish_meddocan_iic_pipeline +date: 2024-09-16 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_meddocan_iic_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_meddocan_iic_pipeline_es_5.5.0_3.0_1726452038084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_meddocan_iic_pipeline_es_5.5.0_3.0_1726452038084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_meddocan_iic_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_meddocan_iic_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_meddocan_iic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|433.3 MB| + +## References + +https://huggingface.co/IIC/bsc-bio-ehr-es-meddocan + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bsc_bio_ehr_spanish_socialdisner_es.md b/docs/_posts/ahmedlone127/2024-09-16-bsc_bio_ehr_spanish_socialdisner_es.md new file mode 100644 index 00000000000000..a58fdeabe60815 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bsc_bio_ehr_spanish_socialdisner_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_socialdisner RoBertaForTokenClassification from IIC +author: John Snow Labs +name: bsc_bio_ehr_spanish_socialdisner +date: 2024-09-16 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_socialdisner` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_socialdisner_es_5.5.0_3.0_1726482651992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_socialdisner_es_5.5.0_3.0_1726482651992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_socialdisner","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_socialdisner", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_socialdisner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|438.9 MB| + +## References + +https://huggingface.co/IIC/bsc-bio-ehr-es-socialdisner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_amazon_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_amazon_model_pipeline_en.md new file mode 100644 index 00000000000000..95059f5ed6a7f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_amazon_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_amazon_model_pipeline pipeline DistilBertForSequenceClassification from thaodao3101 +author: John Snow Labs +name: burmese_amazon_model_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_amazon_model_pipeline` is a English model originally trained by thaodao3101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_amazon_model_pipeline_en_5.5.0_3.0_1726506517588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_amazon_model_pipeline_en_5.5.0_3.0_1726506517588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_amazon_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_amazon_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_amazon_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thaodao3101/my_amazon_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_model_dannielhcv_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_model_dannielhcv_en.md new file mode 100644 index 00000000000000..9fe8bbe5c06923 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_model_dannielhcv_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_dannielhcv DistilBertForSequenceClassification from dannielhcv +author: John Snow Labs +name: burmese_awesome_model_dannielhcv +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_dannielhcv` is a English model originally trained by dannielhcv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_dannielhcv_en_5.5.0_3.0_1726506606331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_dannielhcv_en_5.5.0_3.0_1726506606331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_dannielhcv","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_dannielhcv", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_dannielhcv| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dannielhcv/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_aanwar_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_aanwar_en.md new file mode 100644 index 00000000000000..0dbec031da7f3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_aanwar_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_aanwar DistilBertForQuestionAnswering from aanwar +author: John Snow Labs +name: burmese_awesome_qa_model_aanwar +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_aanwar` is a English model originally trained by aanwar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_aanwar_en_5.5.0_3.0_1726515436778.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_aanwar_en_5.5.0_3.0_1726515436778.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_aanwar","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_aanwar", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_aanwar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/aanwar/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_dianechiang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_dianechiang_pipeline_en.md new file mode 100644 index 00000000000000..a62c9a35338360 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_dianechiang_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_dianechiang_pipeline pipeline DistilBertForQuestionAnswering from DianeChiang +author: John Snow Labs +name: burmese_awesome_qa_model_dianechiang_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_dianechiang_pipeline` is a English model originally trained by DianeChiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_dianechiang_pipeline_en_5.5.0_3.0_1726515059347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_dianechiang_pipeline_en_5.5.0_3.0_1726515059347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_dianechiang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_dianechiang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_dianechiang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/DianeChiang/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_janmayen_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_janmayen_en.md new file mode 100644 index 00000000000000..63d8681b43dc61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_janmayen_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_janmayen DistilBertForQuestionAnswering from Janmayen +author: John Snow Labs +name: burmese_awesome_qa_model_janmayen +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_janmayen` is a English model originally trained by Janmayen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_janmayen_en_5.5.0_3.0_1726469321937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_janmayen_en_5.5.0_3.0_1726469321937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_janmayen","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_janmayen", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_janmayen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Janmayen/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_janmayen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_janmayen_pipeline_en.md new file mode 100644 index 00000000000000..3ecc16b8168441 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_janmayen_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_janmayen_pipeline pipeline DistilBertForQuestionAnswering from Janmayen +author: John Snow Labs +name: burmese_awesome_qa_model_janmayen_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_janmayen_pipeline` is a English model originally trained by Janmayen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_janmayen_pipeline_en_5.5.0_3.0_1726469333862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_janmayen_pipeline_en_5.5.0_3.0_1726469333862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_janmayen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_janmayen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_janmayen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Janmayen/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_jomacgo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_jomacgo_pipeline_en.md new file mode 100644 index 00000000000000..bd31397592c78c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_jomacgo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_jomacgo_pipeline pipeline DistilBertForQuestionAnswering from jomacgo +author: John Snow Labs +name: burmese_awesome_qa_model_jomacgo_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_jomacgo_pipeline` is a English model originally trained by jomacgo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_jomacgo_pipeline_en_5.5.0_3.0_1726469378078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_jomacgo_pipeline_en_5.5.0_3.0_1726469378078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_jomacgo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_jomacgo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_jomacgo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/jomacgo/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_ravikolanpakaa_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_ravikolanpakaa_en.md new file mode 100644 index 00000000000000..0637b9a9719aad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_ravikolanpakaa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_ravikolanpakaa RoBertaForQuestionAnswering from ravikolanpakaa +author: John Snow Labs +name: burmese_awesome_qa_model_ravikolanpakaa +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_ravikolanpakaa` is a English model originally trained by ravikolanpakaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ravikolanpakaa_en_5.5.0_3.0_1726501874360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ravikolanpakaa_en_5.5.0_3.0_1726501874360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("burmese_awesome_qa_model_ravikolanpakaa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("burmese_awesome_qa_model_ravikolanpakaa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_ravikolanpakaa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ravikolanpakaa/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_ravikolanpakaa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_ravikolanpakaa_pipeline_en.md new file mode 100644 index 00000000000000..26427b1675f27e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_ravikolanpakaa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_ravikolanpakaa_pipeline pipeline RoBertaForQuestionAnswering from ravikolanpakaa +author: John Snow Labs +name: burmese_awesome_qa_model_ravikolanpakaa_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_ravikolanpakaa_pipeline` is a English model originally trained by ravikolanpakaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ravikolanpakaa_pipeline_en_5.5.0_3.0_1726501937061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ravikolanpakaa_pipeline_en_5.5.0_3.0_1726501937061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_ravikolanpakaa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_ravikolanpakaa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_ravikolanpakaa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ravikolanpakaa/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-business_unit_model_en.md b/docs/_posts/ahmedlone127/2024-09-16-business_unit_model_en.md new file mode 100644 index 00000000000000..5402eacc009c78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-business_unit_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English business_unit_model BertForSequenceClassification from Unmand +author: John Snow Labs +name: business_unit_model +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`business_unit_model` is a English model originally trained by Unmand. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/business_unit_model_en_5.5.0_3.0_1726462981046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/business_unit_model_en_5.5.0_3.0_1726462981046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("business_unit_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("business_unit_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|business_unit_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/Unmand/business_unit_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-cds_style_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-16-cds_style_classifier_en.md new file mode 100644 index 00000000000000..a61de219fb0fac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-cds_style_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cds_style_classifier RoBertaForSequenceClassification from hallisky +author: John Snow Labs +name: cds_style_classifier +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cds_style_classifier` is a English model originally trained by hallisky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cds_style_classifier_en_5.5.0_3.0_1726519215086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cds_style_classifier_en_5.5.0_3.0_1726519215086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cds_style_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cds_style_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cds_style_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hallisky/cds_style_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-cds_style_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-cds_style_classifier_pipeline_en.md new file mode 100644 index 00000000000000..84452801f2d2ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-cds_style_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cds_style_classifier_pipeline pipeline RoBertaForSequenceClassification from hallisky +author: John Snow Labs +name: cds_style_classifier_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cds_style_classifier_pipeline` is a English model originally trained by hallisky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cds_style_classifier_pipeline_en_5.5.0_3.0_1726519284808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cds_style_classifier_pipeline_en_5.5.0_3.0_1726519284808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cds_style_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cds_style_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cds_style_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hallisky/cds_style_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-check_sec_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-check_sec_pipeline_en.md new file mode 100644 index 00000000000000..74847a1a8f98a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-check_sec_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English check_sec_pipeline pipeline BertForSequenceClassification from huolongguo10 +author: John Snow Labs +name: check_sec_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`check_sec_pipeline` is a English model originally trained by huolongguo10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/check_sec_pipeline_en_5.5.0_3.0_1726499223751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/check_sec_pipeline_en_5.5.0_3.0_1726499223751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("check_sec_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("check_sec_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|check_sec_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/huolongguo10/check_sec + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-cino_large_v2_tusa_en.md b/docs/_posts/ahmedlone127/2024-09-16-cino_large_v2_tusa_en.md new file mode 100644 index 00000000000000..159cb3cc707281 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-cino_large_v2_tusa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cino_large_v2_tusa XlmRoBertaForSequenceClassification from UTibetNLP +author: John Snow Labs +name: cino_large_v2_tusa +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cino_large_v2_tusa` is a English model originally trained by UTibetNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cino_large_v2_tusa_en_5.5.0_3.0_1726516265634.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cino_large_v2_tusa_en_5.5.0_3.0_1726516265634.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("cino_large_v2_tusa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("cino_large_v2_tusa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cino_large_v2_tusa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/UTibetNLP/cino-large-v2_TUSA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-cino_large_v2_tusa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-cino_large_v2_tusa_pipeline_en.md new file mode 100644 index 00000000000000..c99d1282fa6515 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-cino_large_v2_tusa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cino_large_v2_tusa_pipeline pipeline XlmRoBertaForSequenceClassification from UTibetNLP +author: John Snow Labs +name: cino_large_v2_tusa_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cino_large_v2_tusa_pipeline` is a English model originally trained by UTibetNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cino_large_v2_tusa_pipeline_en_5.5.0_3.0_1726516351451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cino_large_v2_tusa_pipeline_en_5.5.0_3.0_1726516351451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cino_large_v2_tusa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cino_large_v2_tusa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cino_large_v2_tusa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/UTibetNLP/cino-large-v2_TUSA + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-climate_attitude_lm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-climate_attitude_lm_pipeline_en.md new file mode 100644 index 00000000000000..6c9a54285d5a2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-climate_attitude_lm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English climate_attitude_lm_pipeline pipeline RoBertaForSequenceClassification from KevSun +author: John Snow Labs +name: climate_attitude_lm_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`climate_attitude_lm_pipeline` is a English model originally trained by KevSun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/climate_attitude_lm_pipeline_en_5.5.0_3.0_1726455555791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/climate_attitude_lm_pipeline_en_5.5.0_3.0_1726455555791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("climate_attitude_lm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("climate_attitude_lm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|climate_attitude_lm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/KevSun/climate-attitude-LM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-code_human_ai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-code_human_ai_pipeline_en.md new file mode 100644 index 00000000000000..accad3e3fc6910 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-code_human_ai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English code_human_ai_pipeline pipeline RoBertaForSequenceClassification from lebretou +author: John Snow Labs +name: code_human_ai_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_human_ai_pipeline` is a English model originally trained by lebretou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_human_ai_pipeline_en_5.5.0_3.0_1726526936804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_human_ai_pipeline_en_5.5.0_3.0_1726526936804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("code_human_ai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("code_human_ai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_human_ai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|458.6 MB| + +## References + +https://huggingface.co/lebretou/code-human-ai + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-code_search_codebert_base_3_random_trimmed_with_g_and_spaces_en.md b/docs/_posts/ahmedlone127/2024-09-16-code_search_codebert_base_3_random_trimmed_with_g_and_spaces_en.md new file mode 100644 index 00000000000000..cb6614588fd085 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-code_search_codebert_base_3_random_trimmed_with_g_and_spaces_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English code_search_codebert_base_3_random_trimmed_with_g_and_spaces RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_3_random_trimmed_with_g_and_spaces +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_3_random_trimmed_with_g_and_spaces` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_3_random_trimmed_with_g_and_spaces_en_5.5.0_3.0_1726530416307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_3_random_trimmed_with_g_and_spaces_en_5.5.0_3.0_1726530416307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_3_random_trimmed_with_g_and_spaces","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_3_random_trimmed_with_g_and_spaces", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_3_random_trimmed_with_g_and_spaces| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_3_random_trimmed_with_g_and_spaces \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-code_search_codebert_base_3_random_trimmed_with_g_and_spaces_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-code_search_codebert_base_3_random_trimmed_with_g_and_spaces_pipeline_en.md new file mode 100644 index 00000000000000..f7d48458adad60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-code_search_codebert_base_3_random_trimmed_with_g_and_spaces_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English code_search_codebert_base_3_random_trimmed_with_g_and_spaces_pipeline pipeline RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_3_random_trimmed_with_g_and_spaces_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_3_random_trimmed_with_g_and_spaces_pipeline` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_3_random_trimmed_with_g_and_spaces_pipeline_en_5.5.0_3.0_1726530437783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_3_random_trimmed_with_g_and_spaces_pipeline_en_5.5.0_3.0_1726530437783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("code_search_codebert_base_3_random_trimmed_with_g_and_spaces_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("code_search_codebert_base_3_random_trimmed_with_g_and_spaces_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_3_random_trimmed_with_g_and_spaces_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_3_random_trimmed_with_g_and_spaces + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-cold_fusion_finetuned_sufficiency_dagstuhl_en.md b/docs/_posts/ahmedlone127/2024-09-16-cold_fusion_finetuned_sufficiency_dagstuhl_en.md new file mode 100644 index 00000000000000..7497e6ac7bb32e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-cold_fusion_finetuned_sufficiency_dagstuhl_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_finetuned_sufficiency_dagstuhl RoBertaForSequenceClassification from jakub014 +author: John Snow Labs +name: cold_fusion_finetuned_sufficiency_dagstuhl +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_finetuned_sufficiency_dagstuhl` is a English model originally trained by jakub014. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_finetuned_sufficiency_dagstuhl_en_5.5.0_3.0_1726527052250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_finetuned_sufficiency_dagstuhl_en_5.5.0_3.0_1726527052250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_finetuned_sufficiency_dagstuhl","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_finetuned_sufficiency_dagstuhl", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_finetuned_sufficiency_dagstuhl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/jakub014/ColD-Fusion-finetuned-sufficiency-dagstuhl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-custserviceclassifier_en.md b/docs/_posts/ahmedlone127/2024-09-16-custserviceclassifier_en.md new file mode 100644 index 00000000000000..ac4d46dce2723e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-custserviceclassifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English custserviceclassifier DistilBertForSequenceClassification from jonmanly +author: John Snow Labs +name: custserviceclassifier +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`custserviceclassifier` is a English model originally trained by jonmanly. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/custserviceclassifier_en_5.5.0_3.0_1726506282163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/custserviceclassifier_en_5.5.0_3.0_1726506282163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("custserviceclassifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("custserviceclassifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|custserviceclassifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jonmanly/custServiceClassifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-dataequity_opus_maltese_spanish_english_en.md b/docs/_posts/ahmedlone127/2024-09-16-dataequity_opus_maltese_spanish_english_en.md new file mode 100644 index 00000000000000..bb56ebc72637c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-dataequity_opus_maltese_spanish_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dataequity_opus_maltese_spanish_english MarianTransformer from dataequity +author: John Snow Labs +name: dataequity_opus_maltese_spanish_english +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dataequity_opus_maltese_spanish_english` is a English model originally trained by dataequity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dataequity_opus_maltese_spanish_english_en_5.5.0_3.0_1726503516879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dataequity_opus_maltese_spanish_english_en_5.5.0_3.0_1726503516879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("dataequity_opus_maltese_spanish_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("dataequity_opus_maltese_spanish_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dataequity_opus_maltese_spanish_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|539.4 MB| + +## References + +https://huggingface.co/dataequity/dataequity-opus-mt-es-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-dataequity_opus_maltese_spanish_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-dataequity_opus_maltese_spanish_english_pipeline_en.md new file mode 100644 index 00000000000000..adca5c95f0edbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-dataequity_opus_maltese_spanish_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dataequity_opus_maltese_spanish_english_pipeline pipeline MarianTransformer from dataequity +author: John Snow Labs +name: dataequity_opus_maltese_spanish_english_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dataequity_opus_maltese_spanish_english_pipeline` is a English model originally trained by dataequity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dataequity_opus_maltese_spanish_english_pipeline_en_5.5.0_3.0_1726503540774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dataequity_opus_maltese_spanish_english_pipeline_en_5.5.0_3.0_1726503540774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dataequity_opus_maltese_spanish_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dataequity_opus_maltese_spanish_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dataequity_opus_maltese_spanish_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|539.9 MB| + +## References + +https://huggingface.co/dataequity/dataequity-opus-mt-es-en + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-deep_3_en.md b/docs/_posts/ahmedlone127/2024-09-16-deep_3_en.md new file mode 100644 index 00000000000000..d9535b2db7d537 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-deep_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deep_3 RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: deep_3 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deep_3` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deep_3_en_5.5.0_3.0_1726504686271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deep_3_en_5.5.0_3.0_1726504686271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("deep_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("deep_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deep_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Deep_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-deep_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-deep_3_pipeline_en.md new file mode 100644 index 00000000000000..b7a4732c242363 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-deep_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deep_3_pipeline pipeline RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: deep_3_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deep_3_pipeline` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deep_3_pipeline_en_5.5.0_3.0_1726504707677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deep_3_pipeline_en_5.5.0_3.0_1726504707677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deep_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deep_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deep_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Deep_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-definition_classification_v1_en.md b/docs/_posts/ahmedlone127/2024-09-16-definition_classification_v1_en.md new file mode 100644 index 00000000000000..c840e427409e7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-definition_classification_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English definition_classification_v1 RoBertaForSequenceClassification from luisespinosa +author: John Snow Labs +name: definition_classification_v1 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`definition_classification_v1` is a English model originally trained by luisespinosa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/definition_classification_v1_en_5.5.0_3.0_1726455174860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/definition_classification_v1_en_5.5.0_3.0_1726455174860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("definition_classification_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("definition_classification_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|definition_classification_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|418.6 MB| + +## References + +https://huggingface.co/luisespinosa/definition-classification-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distibert_finetuned_arxiv_multi_label_en.md b/docs/_posts/ahmedlone127/2024-09-16-distibert_finetuned_arxiv_multi_label_en.md new file mode 100644 index 00000000000000..8c05d1884e87e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distibert_finetuned_arxiv_multi_label_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distibert_finetuned_arxiv_multi_label DistilBertForSequenceClassification from Hatoun +author: John Snow Labs +name: distibert_finetuned_arxiv_multi_label +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distibert_finetuned_arxiv_multi_label` is a English model originally trained by Hatoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distibert_finetuned_arxiv_multi_label_en_5.5.0_3.0_1726506889464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distibert_finetuned_arxiv_multi_label_en_5.5.0_3.0_1726506889464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distibert_finetuned_arxiv_multi_label","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distibert_finetuned_arxiv_multi_label", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distibert_finetuned_arxiv_multi_label| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/Hatoun/DistiBERT-finetuned-arxiv-multi-label \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_1k_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_1k_en.md new file mode 100644 index 00000000000000..c33ba0a8d0a006 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_1k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_1k DistilBertForSequenceClassification from addykan +author: John Snow Labs +name: distilbert_1k +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_1k` is a English model originally trained by addykan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_1k_en_5.5.0_3.0_1726506612217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_1k_en_5.5.0_3.0_1726506612217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_1k","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_1k", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_1k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/addykan/distilbert-1k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_akshit_g_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_akshit_g_en.md new file mode 100644 index 00000000000000..d2dc182889fbee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_akshit_g_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_cased_akshit_g BertForQuestionAnswering from akshit-g +author: John Snow Labs +name: distilbert_base_cased_akshit_g +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_akshit_g` is a English model originally trained by akshit-g. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_akshit_g_en_5.5.0_3.0_1726507919871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_akshit_g_en_5.5.0_3.0_1726507919871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("distilbert_base_cased_akshit_g","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("distilbert_base_cased_akshit_g", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_akshit_g| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/akshit-g/distilbert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_akshit_g_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_akshit_g_pipeline_en.md new file mode 100644 index 00000000000000..831012851be5d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_akshit_g_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_cased_akshit_g_pipeline pipeline BertForQuestionAnswering from akshit-g +author: John Snow Labs +name: distilbert_base_cased_akshit_g_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_akshit_g_pipeline` is a English model originally trained by akshit-g. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_akshit_g_pipeline_en_5.5.0_3.0_1726507938137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_akshit_g_pipeline_en_5.5.0_3.0_1726507938137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_akshit_g_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_akshit_g_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_akshit_g_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/akshit-g/distilbert-base-cased + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_distilled_squad_lora_merged_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_distilled_squad_lora_merged_en.md new file mode 100644 index 00000000000000..1282d7bd381d3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_distilled_squad_lora_merged_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_cased_distilled_squad_lora_merged DistilBertForQuestionAnswering from JeukHwang +author: John Snow Labs +name: distilbert_base_cased_distilled_squad_lora_merged +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_distilled_squad_lora_merged` is a English model originally trained by JeukHwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_lora_merged_en_5.5.0_3.0_1726515444140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_lora_merged_en_5.5.0_3.0_1726515444140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_cased_distilled_squad_lora_merged","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_cased_distilled_squad_lora_merged", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_distilled_squad_lora_merged| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|154.8 MB| + +## References + +https://huggingface.co/JeukHwang/distilbert-base-cased-distilled-squad-lora-merged \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_distilled_squad_lora_merged_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_distilled_squad_lora_merged_pipeline_en.md new file mode 100644 index 00000000000000..1ebab0e1ea06f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_distilled_squad_lora_merged_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_cased_distilled_squad_lora_merged_pipeline pipeline DistilBertForQuestionAnswering from JeukHwang +author: John Snow Labs +name: distilbert_base_cased_distilled_squad_lora_merged_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_distilled_squad_lora_merged_pipeline` is a English model originally trained by JeukHwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_lora_merged_pipeline_en_5.5.0_3.0_1726515491291.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_lora_merged_pipeline_en_5.5.0_3.0_1726515491291.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_distilled_squad_lora_merged_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_distilled_squad_lora_merged_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_distilled_squad_lora_merged_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|154.8 MB| + +## References + +https://huggingface.co/JeukHwang/distilbert-base-cased-distilled-squad-lora-merged + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_qa_mash_covid_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_qa_mash_covid_en.md new file mode 100644 index 00000000000000..a40a3c3f3472cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_qa_mash_covid_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_cased_qa_mash_covid DistilBertForQuestionAnswering from Eurosmart +author: John Snow Labs +name: distilbert_base_cased_qa_mash_covid +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_qa_mash_covid` is a English model originally trained by Eurosmart. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_qa_mash_covid_en_5.5.0_3.0_1726515418287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_qa_mash_covid_en_5.5.0_3.0_1726515418287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_cased_qa_mash_covid","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_cased_qa_mash_covid", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_qa_mash_covid| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/Eurosmart/distilbert-base-cased-qa-mash-covid \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_ccon_mypersonality_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_ccon_mypersonality_en.md new file mode 100644 index 00000000000000..b1e7370d423949 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_ccon_mypersonality_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_ccon_mypersonality DistilBertForSequenceClassification from holistic-ai +author: John Snow Labs +name: distilbert_base_uncased_ccon_mypersonality +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_ccon_mypersonality` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ccon_mypersonality_en_5.5.0_3.0_1726525719008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ccon_mypersonality_en_5.5.0_3.0_1726525719008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_ccon_mypersonality","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_ccon_mypersonality", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_ccon_mypersonality| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/holistic-ai/distilbert-base-uncased_cCON_mypersonality \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_detect_ai_generated_text_lau123_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_detect_ai_generated_text_lau123_en.md new file mode 100644 index 00000000000000..c2300ac90057cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_detect_ai_generated_text_lau123_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_detect_ai_generated_text_lau123 DistilBertForSequenceClassification from Lau123 +author: John Snow Labs +name: distilbert_base_uncased_detect_ai_generated_text_lau123 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_detect_ai_generated_text_lau123` is a English model originally trained by Lau123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_detect_ai_generated_text_lau123_en_5.5.0_3.0_1726525684469.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_detect_ai_generated_text_lau123_en_5.5.0_3.0_1726525684469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_detect_ai_generated_text_lau123","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_detect_ai_generated_text_lau123", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_detect_ai_generated_text_lau123| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Lau123/distilbert-base-uncased-detect_ai_generated_text \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_clinc_jeongyeom_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_clinc_jeongyeom_en.md new file mode 100644 index 00000000000000..2d29924c4044b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_clinc_jeongyeom_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_jeongyeom DistilBertForSequenceClassification from jeongyeom +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_jeongyeom +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_jeongyeom` is a English model originally trained by jeongyeom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_jeongyeom_en_5.5.0_3.0_1726525576716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_jeongyeom_en_5.5.0_3.0_1726525576716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_jeongyeom","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_jeongyeom", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_jeongyeom| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/jeongyeom/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_squad_geerath_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_squad_geerath_en.md new file mode 100644 index 00000000000000..df5576d62ba4f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_squad_geerath_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_squad_geerath DistilBertForQuestionAnswering from Geerath +author: John Snow Labs +name: distilbert_base_uncased_distilled_squad_geerath +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_squad_geerath` is a English model originally trained by Geerath. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_squad_geerath_en_5.5.0_3.0_1726469538019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_squad_geerath_en_5.5.0_3.0_1726469538019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_distilled_squad_geerath","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_distilled_squad_geerath", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_squad_geerath| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Geerath/distilbert-base-uncased-distilled-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_squad_geerath_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_squad_geerath_pipeline_en.md new file mode 100644 index 00000000000000..b6f91c909d547b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_squad_geerath_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_squad_geerath_pipeline pipeline DistilBertForQuestionAnswering from Geerath +author: John Snow Labs +name: distilbert_base_uncased_distilled_squad_geerath_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_squad_geerath_pipeline` is a English model originally trained by Geerath. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_squad_geerath_pipeline_en_5.5.0_3.0_1726469549231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_squad_geerath_pipeline_en_5.5.0_3.0_1726469549231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_squad_geerath_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_squad_geerath_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_squad_geerath_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Geerath/distilbert-base-uncased-distilled-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_bioasq_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_bioasq_en.md new file mode 100644 index 00000000000000..8bc0299d9be452 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_bioasq_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_bioasq DistilBertForQuestionAnswering from jkhsong +author: John Snow Labs +name: distilbert_base_uncased_finetuned_bioasq +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_bioasq` is a English model originally trained by jkhsong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_bioasq_en_5.5.0_3.0_1726469379373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_bioasq_en_5.5.0_3.0_1726469379373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_bioasq","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_bioasq", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_bioasq| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/jkhsong/distilbert-base-uncased-finetuned-bioasq \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_bioasq_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_bioasq_pipeline_en.md new file mode 100644 index 00000000000000..5f8978f262d71a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_bioasq_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_bioasq_pipeline pipeline DistilBertForQuestionAnswering from jkhsong +author: John Snow Labs +name: distilbert_base_uncased_finetuned_bioasq_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_bioasq_pipeline` is a English model originally trained by jkhsong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_bioasq_pipeline_en_5.5.0_3.0_1726469391203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_bioasq_pipeline_en_5.5.0_3.0_1726469391203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_bioasq_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_bioasq_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_bioasq_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/jkhsong/distilbert-base-uncased-finetuned-bioasq + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_holymorphsim_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_holymorphsim_en.md new file mode 100644 index 00000000000000..ad162a20740c46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_holymorphsim_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_holymorphsim DistilBertForSequenceClassification from HolyMorphsim +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_holymorphsim +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_holymorphsim` is a English model originally trained by HolyMorphsim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_holymorphsim_en_5.5.0_3.0_1726506303988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_holymorphsim_en_5.5.0_3.0_1726506303988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_holymorphsim","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_holymorphsim", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_holymorphsim| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/HolyMorphsim/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_holymorphsim_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_holymorphsim_pipeline_en.md new file mode 100644 index 00000000000000..b44075a12cb8d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_holymorphsim_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_holymorphsim_pipeline pipeline DistilBertForSequenceClassification from HolyMorphsim +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_holymorphsim_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_holymorphsim_pipeline` is a English model originally trained by HolyMorphsim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_holymorphsim_pipeline_en_5.5.0_3.0_1726506315672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_holymorphsim_pipeline_en_5.5.0_3.0_1726506315672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_holymorphsim_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_holymorphsim_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_holymorphsim_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/HolyMorphsim/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_ka_zu08181_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_ka_zu08181_pipeline_en.md new file mode 100644 index 00000000000000..3cfdca8bdbdb53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_ka_zu08181_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ka_zu08181_pipeline pipeline DistilBertForSequenceClassification from ka-zu08181 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ka_zu08181_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ka_zu08181_pipeline` is a English model originally trained by ka-zu08181. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ka_zu08181_pipeline_en_5.5.0_3.0_1726506511385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ka_zu08181_pipeline_en_5.5.0_3.0_1726506511385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ka_zu08181_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ka_zu08181_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ka_zu08181_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ka-zu08181/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_kcwong168_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_kcwong168_en.md new file mode 100644 index 00000000000000..eaf94dedc57e16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_kcwong168_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_kcwong168 DistilBertForSequenceClassification from KCWONG168 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_kcwong168 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_kcwong168` is a English model originally trained by KCWONG168. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kcwong168_en_5.5.0_3.0_1726506697259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kcwong168_en_5.5.0_3.0_1726506697259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_kcwong168","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_kcwong168", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_kcwong168| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KCWONG168/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_kcwong168_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_kcwong168_pipeline_en.md new file mode 100644 index 00000000000000..553accb43a9f1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_kcwong168_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_kcwong168_pipeline pipeline DistilBertForSequenceClassification from KCWONG168 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_kcwong168_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_kcwong168_pipeline` is a English model originally trained by KCWONG168. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kcwong168_pipeline_en_5.5.0_3.0_1726506709248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kcwong168_pipeline_en_5.5.0_3.0_1726506709248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_kcwong168_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_kcwong168_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_kcwong168_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KCWONG168/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_wickelman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_wickelman_pipeline_en.md new file mode 100644 index 00000000000000..8792d099412d08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_wickelman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_wickelman_pipeline pipeline DistilBertForSequenceClassification from Wickelman +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_wickelman_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_wickelman_pipeline` is a English model originally trained by Wickelman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_wickelman_pipeline_en_5.5.0_3.0_1726506192841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_wickelman_pipeline_en_5.5.0_3.0_1726506192841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_wickelman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_wickelman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_wickelman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Wickelman/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_d5716d28_guoguo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_d5716d28_guoguo_pipeline_en.md new file mode 100644 index 00000000000000..26d05b7ff5d822 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_d5716d28_guoguo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_guoguo_pipeline pipeline DistilBertForQuestionAnswering from guoguo +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_guoguo_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_guoguo_pipeline` is a English model originally trained by guoguo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_guoguo_pipeline_en_5.5.0_3.0_1726469502921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_guoguo_pipeline_en_5.5.0_3.0_1726469502921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_guoguo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_guoguo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_guoguo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/guoguo/distilbert-base-uncased-finetuned-squad-d5716d28 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_d5716d28_sayaendo_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_d5716d28_sayaendo_en.md new file mode 100644 index 00000000000000..a2eb99703409b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_d5716d28_sayaendo_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_sayaendo DistilBertEmbeddings from SayaEndo +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_sayaendo +date: 2024-09-16 +tags: [distilbert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_sayaendo` is a English model originally trained by SayaEndo. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_sayaendo_en_5.5.0_3.0_1726469338345.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_sayaendo_en_5.5.0_3.0_1726469338345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_sayaendo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = DistilBertEmbeddings + .pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_sayaendo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_sayaendo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +References + +https://huggingface.co/SayaEndo/distilbert-base-uncased-finetuned-squad-d5716d28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_elephaunt_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_elephaunt_en.md new file mode 100644 index 00000000000000..7bbc4c084ffd3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_elephaunt_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_elephaunt DistilBertForQuestionAnswering from Elephaunt +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_elephaunt +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_elephaunt` is a English model originally trained by Elephaunt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_elephaunt_en_5.5.0_3.0_1726515195030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_elephaunt_en_5.5.0_3.0_1726515195030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_elephaunt","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_elephaunt", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_elephaunt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Elephaunt/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_hunniee_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_hunniee_en.md new file mode 100644 index 00000000000000..04fe485996138b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_hunniee_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_hunniee DistilBertForQuestionAnswering from hunniee +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_hunniee +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_hunniee` is a English model originally trained by hunniee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hunniee_en_5.5.0_3.0_1726515317518.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hunniee_en_5.5.0_3.0_1726515317518.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_hunniee","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_hunniee", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_hunniee| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/hunniee/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_hyounguk_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_hyounguk_en.md new file mode 100644 index 00000000000000..7d17dd46a8b663 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_hyounguk_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_hyounguk DistilBertForQuestionAnswering from Hyounguk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_hyounguk +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_hyounguk` is a English model originally trained by Hyounguk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hyounguk_en_5.5.0_3.0_1726469788529.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hyounguk_en_5.5.0_3.0_1726469788529.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_hyounguk","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_hyounguk", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_hyounguk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Hyounguk/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_markr23_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_markr23_pipeline_en.md new file mode 100644 index 00000000000000..b6021795a1b177 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_markr23_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_markr23_pipeline pipeline DistilBertForQuestionAnswering from markr23 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_markr23_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_markr23_pipeline` is a English model originally trained by markr23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_markr23_pipeline_en_5.5.0_3.0_1726469686163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_markr23_pipeline_en_5.5.0_3.0_1726469686163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_markr23_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_markr23_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_markr23_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/markr23/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_riaraju_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_riaraju_en.md new file mode 100644 index 00000000000000..6c93024fe56b0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_riaraju_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_riaraju DistilBertForQuestionAnswering from riaraju +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_riaraju +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_riaraju` is a English model originally trained by riaraju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_riaraju_en_5.5.0_3.0_1726515128375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_riaraju_en_5.5.0_3.0_1726515128375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_riaraju","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_riaraju", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_riaraju| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/riaraju/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_russelltodd_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_russelltodd_en.md new file mode 100644 index 00000000000000..3b327dfebaa966 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_russelltodd_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_russelltodd DistilBertForQuestionAnswering from russelltodd +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_russelltodd +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_russelltodd` is a English model originally trained by russelltodd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_russelltodd_en_5.5.0_3.0_1726515203362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_russelltodd_en_5.5.0_3.0_1726515203362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_russelltodd","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_russelltodd", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_russelltodd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/russelltodd/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_russelltodd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_russelltodd_pipeline_en.md new file mode 100644 index 00000000000000..5dfe3b622a2e2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_russelltodd_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_russelltodd_pipeline pipeline DistilBertForQuestionAnswering from russelltodd +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_russelltodd_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_russelltodd_pipeline` is a English model originally trained by russelltodd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_russelltodd_pipeline_en_5.5.0_3.0_1726515216250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_russelltodd_pipeline_en_5.5.0_3.0_1726515216250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_russelltodd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_russelltodd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_russelltodd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/russelltodd/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_sm750s_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_sm750s_en.md new file mode 100644 index 00000000000000..28b673e2c791ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_sm750s_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_sm750s DistilBertForQuestionAnswering from sm750s +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_sm750s +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_sm750s` is a English model originally trained by sm750s. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_sm750s_en_5.5.0_3.0_1726469242539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_sm750s_en_5.5.0_3.0_1726469242539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_sm750s","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_sm750s", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_sm750s| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/sm750s/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_suthanhcong_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_suthanhcong_en.md new file mode 100644 index 00000000000000..0dbc89cf2a6ccb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_suthanhcong_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_suthanhcong DistilBertForQuestionAnswering from suthanhcong +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_suthanhcong +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_suthanhcong` is a English model originally trained by suthanhcong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_suthanhcong_en_5.5.0_3.0_1726515547097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_suthanhcong_en_5.5.0_3.0_1726515547097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_suthanhcong","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_suthanhcong", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_suthanhcong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/suthanhcong/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_v2_hcy5561_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_v2_hcy5561_pipeline_en.md new file mode 100644 index 00000000000000..1b3fcc2a807746 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_v2_hcy5561_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_v2_hcy5561_pipeline pipeline DistilBertForQuestionAnswering from hcy5561 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_v2_hcy5561_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_v2_hcy5561_pipeline` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_v2_hcy5561_pipeline_en_5.5.0_3.0_1726469296850.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_v2_hcy5561_pipeline_en_5.5.0_3.0_1726469296850.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_v2_hcy5561_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_v2_hcy5561_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_v2_hcy5561_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/hcy5561/distilbert-base-uncased-finetuned-squad_v2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_vijmeister_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_vijmeister_pipeline_en.md new file mode 100644 index 00000000000000..2b15fe8d29f7ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_vijmeister_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_vijmeister_pipeline pipeline DistilBertForQuestionAnswering from vijmeister +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_vijmeister_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_vijmeister_pipeline` is a English model originally trained by vijmeister. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_vijmeister_pipeline_en_5.5.0_3.0_1726469577774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_vijmeister_pipeline_en_5.5.0_3.0_1726469577774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_vijmeister_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_vijmeister_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_vijmeister_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/vijmeister/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squard_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squard_v2_pipeline_en.md new file mode 100644 index 00000000000000..3592b68cf85cd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squard_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squard_v2_pipeline pipeline DistilBertForQuestionAnswering from laanhtu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squard_v2_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squard_v2_pipeline` is a English model originally trained by laanhtu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squard_v2_pipeline_en_5.5.0_3.0_1726469545467.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squard_v2_pipeline_en_5.5.0_3.0_1726469545467.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squard_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squard_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squard_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/laanhtu/distilbert-base-uncased-finetuned-squard_v2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_the_fire_flower_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_the_fire_flower_pipeline_en.md new file mode 100644 index 00000000000000..6e31704ef73f9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_the_fire_flower_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_the_fire_flower_pipeline pipeline BertEmbeddings from miggwp +author: John Snow Labs +name: distilbert_base_uncased_finetuned_the_fire_flower_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_the_fire_flower_pipeline` is a English model originally trained by miggwp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_the_fire_flower_pipeline_en_5.5.0_3.0_1726472212163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_the_fire_flower_pipeline_en_5.5.0_3.0_1726472212163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_the_fire_flower_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_the_fire_flower_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_the_fire_flower_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/miggwp/distilbert-base-uncased-finetuned-the-fire-flower + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1_plprefix0stlarge30_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1_plprefix0stlarge30_simsp_en.md new file mode 100644 index 00000000000000..468688f5f4b6a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1_plprefix0stlarge30_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1_plprefix0stlarge30_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1_plprefix0stlarge30_simsp +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1_plprefix0stlarge30_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1_plprefix0stlarge30_simsp_en_5.5.0_3.0_1726506124039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1_plprefix0stlarge30_simsp_en_5.5.0_3.0_1726506124039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1_plprefix0stlarge30_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1_plprefix0stlarge30_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1_plprefix0stlarge30_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st30sd_ut72ut1_PLPrefix0stlarge30_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_squad2_p40_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_squad2_p40_en.md new file mode 100644 index 00000000000000..39d26b1bec6515 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_squad2_p40_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p40 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p40 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p40` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p40_en_5.5.0_3.0_1726469577116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p40_en_5.5.0_3.0_1726469577116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p40","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p40", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p40| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|199.7 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p40 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_squad2_p55_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_squad2_p55_en.md new file mode 100644 index 00000000000000..eef492514dc54c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_squad2_p55_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p55 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p55 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p55` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p55_en_5.5.0_3.0_1726469507641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p55_en_5.5.0_3.0_1726469507641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p55","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p55", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p55| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|177.7 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p55 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_utility_zphr_0st_ut12ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_utility_zphr_0st_ut12ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..b737c8c6c444de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_utility_zphr_0st_ut12ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_utility_zphr_0st_ut12ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_utility_zphr_0st_ut12ut1_plain_simsp +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_utility_zphr_0st_ut12ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut12ut1_plain_simsp_en_5.5.0_3.0_1726506614028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut12ut1_plain_simsp_en_5.5.0_3.0_1726506614028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_utility_zphr_0st_ut12ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_utility_zphr_0st_ut12ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_utility_zphr_0st_ut12ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_utility_zphr_0st_ut12ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_ledai0913_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_ledai0913_en.md new file mode 100644 index 00000000000000..e6dcb40768bc9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_ledai0913_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_finetuned_squadv2_ledai0913 DistilBertForQuestionAnswering from ledai0913 +author: John Snow Labs +name: distilbert_finetuned_squadv2_ledai0913 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squadv2_ledai0913` is a English model originally trained by ledai0913. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_ledai0913_en_5.5.0_3.0_1726469656550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_ledai0913_en_5.5.0_3.0_1726469656550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_squadv2_ledai0913","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_squadv2_ledai0913", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squadv2_ledai0913| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/ledai0913/distilbert-finetuned-squadv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_ledai0913_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_ledai0913_pipeline_en.md new file mode 100644 index 00000000000000..f9568791ba9e29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_ledai0913_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_finetuned_squadv2_ledai0913_pipeline pipeline DistilBertForQuestionAnswering from ledai0913 +author: John Snow Labs +name: distilbert_finetuned_squadv2_ledai0913_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squadv2_ledai0913_pipeline` is a English model originally trained by ledai0913. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_ledai0913_pipeline_en_5.5.0_3.0_1726469668168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_ledai0913_pipeline_en_5.5.0_3.0_1726469668168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_squadv2_ledai0913_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_squadv2_ledai0913_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squadv2_ledai0913_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/ledai0913/distilbert-finetuned-squadv2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_tienhuynh_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_tienhuynh_en.md new file mode 100644 index 00000000000000..26e785889c64a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_tienhuynh_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_finetuned_squadv2_tienhuynh DistilBertForQuestionAnswering from tienhuynh +author: John Snow Labs +name: distilbert_finetuned_squadv2_tienhuynh +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squadv2_tienhuynh` is a English model originally trained by tienhuynh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_tienhuynh_en_5.5.0_3.0_1726469742244.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_tienhuynh_en_5.5.0_3.0_1726469742244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_squadv2_tienhuynh","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_squadv2_tienhuynh", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squadv2_tienhuynh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/tienhuynh/distilbert-finetuned-squadv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_winvswon78_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_winvswon78_pipeline_en.md new file mode 100644 index 00000000000000..b593f2451d4b48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_winvswon78_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_finetuned_squadv2_winvswon78_pipeline pipeline DistilBertForQuestionAnswering from winvswon78 +author: John Snow Labs +name: distilbert_finetuned_squadv2_winvswon78_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squadv2_winvswon78_pipeline` is a English model originally trained by winvswon78. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_winvswon78_pipeline_en_5.5.0_3.0_1726469637747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_winvswon78_pipeline_en_5.5.0_3.0_1726469637747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_squadv2_winvswon78_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_squadv2_winvswon78_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squadv2_winvswon78_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/winvswon78/distilbert-finetuned-squadv2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_portuguese_optim_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_portuguese_optim_en.md new file mode 100644 index 00000000000000..f4972fe22c05aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_portuguese_optim_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_portuguese_optim DistilBertForQuestionAnswering from tyavika +author: John Snow Labs +name: distilbert_portuguese_optim +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_portuguese_optim` is a English model originally trained by tyavika. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_portuguese_optim_en_5.5.0_3.0_1726469630693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_portuguese_optim_en_5.5.0_3.0_1726469630693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_portuguese_optim","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_portuguese_optim", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_portuguese_optim| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/tyavika/DistilBERT-Pt-optim \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_portuguese_optim_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_portuguese_optim_pipeline_en.md new file mode 100644 index 00000000000000..710316d15fcc88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_portuguese_optim_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_portuguese_optim_pipeline pipeline DistilBertForQuestionAnswering from tyavika +author: John Snow Labs +name: distilbert_portuguese_optim_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_portuguese_optim_pipeline` is a English model originally trained by tyavika. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_portuguese_optim_pipeline_en_5.5.0_3.0_1726469642361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_portuguese_optim_pipeline_en_5.5.0_3.0_1726469642361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_portuguese_optim_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_portuguese_optim_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_portuguese_optim_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/tyavika/DistilBERT-Pt-optim + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_en.md new file mode 100644 index 00000000000000..bc678a8c4ca6d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_en_5.5.0_3.0_1726525723552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_en_5.5.0_3.0_1726525723552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qnli_192 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_turkish_turkish_movie_reviews_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_turkish_turkish_movie_reviews_pipeline_tr.md new file mode 100644 index 00000000000000..b1b242edd6048d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_turkish_turkish_movie_reviews_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish distilbert_turkish_turkish_movie_reviews_pipeline pipeline DistilBertForSequenceClassification from anilguven +author: John Snow Labs +name: distilbert_turkish_turkish_movie_reviews_pipeline +date: 2024-09-16 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turkish_turkish_movie_reviews_pipeline` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_movie_reviews_pipeline_tr_5.5.0_3.0_1726525480317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_movie_reviews_pipeline_tr_5.5.0_3.0_1726525480317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_turkish_turkish_movie_reviews_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_turkish_turkish_movie_reviews_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turkish_turkish_movie_reviews_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|254.1 MB| + +## References + +https://huggingface.co/anilguven/distilbert_tr_turkish_movie_reviews + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distillbert_for_rhetorical_rolelabeling_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distillbert_for_rhetorical_rolelabeling_pipeline_en.md new file mode 100644 index 00000000000000..24eea8961917af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distillbert_for_rhetorical_rolelabeling_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distillbert_for_rhetorical_rolelabeling_pipeline pipeline DistilBertForSequenceClassification from engineersaloni159 +author: John Snow Labs +name: distillbert_for_rhetorical_rolelabeling_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_for_rhetorical_rolelabeling_pipeline` is a English model originally trained by engineersaloni159. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_for_rhetorical_rolelabeling_pipeline_en_5.5.0_3.0_1726506184738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_for_rhetorical_rolelabeling_pipeline_en_5.5.0_3.0_1726506184738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distillbert_for_rhetorical_rolelabeling_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distillbert_for_rhetorical_rolelabeling_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_for_rhetorical_rolelabeling_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/engineersaloni159/DistillBERT_for_rhetorical_rolelabeling + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilroberta_base_mic_nlp_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_base_mic_nlp_en.md new file mode 100644 index 00000000000000..ce1b6c3a5db93a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_base_mic_nlp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_mic_nlp RoBertaForSequenceClassification from agi-css +author: John Snow Labs +name: distilroberta_base_mic_nlp +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_mic_nlp` is a English model originally trained by agi-css. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_mic_nlp_en_5.5.0_3.0_1726518816618.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_mic_nlp_en_5.5.0_3.0_1726518816618.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_mic_nlp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_mic_nlp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_mic_nlp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/agi-css/distilroberta-base-mic-nlp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilroberta_base_mic_nlp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_base_mic_nlp_pipeline_en.md new file mode 100644 index 00000000000000..16735a2f62b1c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_base_mic_nlp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_mic_nlp_pipeline pipeline RoBertaForSequenceClassification from agi-css +author: John Snow Labs +name: distilroberta_base_mic_nlp_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_mic_nlp_pipeline` is a English model originally trained by agi-css. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_mic_nlp_pipeline_en_5.5.0_3.0_1726518832102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_mic_nlp_pipeline_en_5.5.0_3.0_1726518832102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_mic_nlp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_mic_nlp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_mic_nlp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/agi-css/distilroberta-base-mic-nlp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilroberta_base_mrpc_glue_santi_cinotti_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_base_mrpc_glue_santi_cinotti_en.md new file mode 100644 index 00000000000000..7f3524ec0002e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_base_mrpc_glue_santi_cinotti_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_mrpc_glue_santi_cinotti RoBertaForSequenceClassification from Santici +author: John Snow Labs +name: distilroberta_base_mrpc_glue_santi_cinotti +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_mrpc_glue_santi_cinotti` is a English model originally trained by Santici. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_mrpc_glue_santi_cinotti_en_5.5.0_3.0_1726527525148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_mrpc_glue_santi_cinotti_en_5.5.0_3.0_1726527525148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_mrpc_glue_santi_cinotti","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_mrpc_glue_santi_cinotti", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_mrpc_glue_santi_cinotti| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Santici/distilroberta-base-mrpc-glue-santi-cinotti \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilroberta_base_mrpc_glue_santi_cinotti_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_base_mrpc_glue_santi_cinotti_pipeline_en.md new file mode 100644 index 00000000000000..21aeaaf1cc7be5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_base_mrpc_glue_santi_cinotti_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_mrpc_glue_santi_cinotti_pipeline pipeline RoBertaForSequenceClassification from Santici +author: John Snow Labs +name: distilroberta_base_mrpc_glue_santi_cinotti_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_mrpc_glue_santi_cinotti_pipeline` is a English model originally trained by Santici. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_mrpc_glue_santi_cinotti_pipeline_en_5.5.0_3.0_1726527540523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_mrpc_glue_santi_cinotti_pipeline_en_5.5.0_3.0_1726527540523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_mrpc_glue_santi_cinotti_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_mrpc_glue_santi_cinotti_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_mrpc_glue_santi_cinotti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Santici/distilroberta-base-mrpc-glue-santi-cinotti + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilroberta_finetuned_bloomberg_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_finetuned_bloomberg_classifier_en.md new file mode 100644 index 00000000000000..f3b3b3a1a07024 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_finetuned_bloomberg_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_finetuned_bloomberg_classifier RoBertaForSequenceClassification from oli1728 +author: John Snow Labs +name: distilroberta_finetuned_bloomberg_classifier +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_finetuned_bloomberg_classifier` is a English model originally trained by oli1728. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_finetuned_bloomberg_classifier_en_5.5.0_3.0_1726518937219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_finetuned_bloomberg_classifier_en_5.5.0_3.0_1726518937219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_finetuned_bloomberg_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_finetuned_bloomberg_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_finetuned_bloomberg_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/oli1728/distilroberta-finetuned-bloomberg-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilroberta_finetuned_bloomberg_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_finetuned_bloomberg_classifier_pipeline_en.md new file mode 100644 index 00000000000000..af64a427b7081a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_finetuned_bloomberg_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_finetuned_bloomberg_classifier_pipeline pipeline RoBertaForSequenceClassification from oli1728 +author: John Snow Labs +name: distilroberta_finetuned_bloomberg_classifier_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_finetuned_bloomberg_classifier_pipeline` is a English model originally trained by oli1728. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_finetuned_bloomberg_classifier_pipeline_en_5.5.0_3.0_1726518951988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_finetuned_bloomberg_classifier_pipeline_en_5.5.0_3.0_1726518951988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_finetuned_bloomberg_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_finetuned_bloomberg_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_finetuned_bloomberg_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/oli1728/distilroberta-finetuned-bloomberg-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilroberta_ner_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_ner_en.md new file mode 100644 index 00000000000000..b2575303ab9f7b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilroberta_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_ner RoBertaForTokenClassification from OmVibhandik +author: John Snow Labs +name: distilroberta_ner +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_ner` is a English model originally trained by OmVibhandik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_ner_en_5.5.0_3.0_1726482528159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_ner_en_5.5.0_3.0_1726482528159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("distilroberta_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("distilroberta_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|306.8 MB| + +## References + +https://huggingface.co/OmVibhandik/distilroberta_NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-dsarreaderroberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-dsarreaderroberta_pipeline_en.md new file mode 100644 index 00000000000000..3370175720851a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-dsarreaderroberta_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English dsarreaderroberta_pipeline pipeline RoBertaForQuestionAnswering from hsmith-morganhill +author: John Snow Labs +name: dsarreaderroberta_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dsarreaderroberta_pipeline` is a English model originally trained by hsmith-morganhill. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dsarreaderroberta_pipeline_en_5.5.0_3.0_1726460579136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dsarreaderroberta_pipeline_en_5.5.0_3.0_1726460579136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dsarreaderroberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dsarreaderroberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dsarreaderroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.8 MB| + +## References + +https://huggingface.co/hsmith-morganhill/dsarReaderRoberta + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-dummy_2_en.md b/docs/_posts/ahmedlone127/2024-09-16-dummy_2_en.md new file mode 100644 index 00000000000000..3286e6623e0cee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-dummy_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dummy_2 BertForSequenceClassification from AnonymousSub +author: John Snow Labs +name: dummy_2 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_2` is a English model originally trained by AnonymousSub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_2_en_5.5.0_3.0_1726492789154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_2_en_5.5.0_3.0_1726492789154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("dummy_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("dummy_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/AnonymousSub/dummy_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-e2m_dataset_tags_950_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-e2m_dataset_tags_950_pipeline_en.md new file mode 100644 index 00000000000000..5ac925c4f211be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-e2m_dataset_tags_950_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English e2m_dataset_tags_950_pipeline pipeline MarianTransformer from mekaneeky +author: John Snow Labs +name: e2m_dataset_tags_950_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e2m_dataset_tags_950_pipeline` is a English model originally trained by mekaneeky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e2m_dataset_tags_950_pipeline_en_5.5.0_3.0_1726457646558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e2m_dataset_tags_950_pipeline_en_5.5.0_3.0_1726457646558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("e2m_dataset_tags_950_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("e2m_dataset_tags_950_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e2m_dataset_tags_950_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|530.7 MB| + +## References + +https://huggingface.co/mekaneeky/e2m-dataset-tags-950 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-efficient_mlm_m0_40_801010_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-efficient_mlm_m0_40_801010_pipeline_en.md new file mode 100644 index 00000000000000..f0748846f12c0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-efficient_mlm_m0_40_801010_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English efficient_mlm_m0_40_801010_pipeline pipeline RoBertaEmbeddings from princeton-nlp +author: John Snow Labs +name: efficient_mlm_m0_40_801010_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`efficient_mlm_m0_40_801010_pipeline` is a English model originally trained by princeton-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_801010_pipeline_en_5.5.0_3.0_1726514122993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_801010_pipeline_en_5.5.0_3.0_1726514122993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("efficient_mlm_m0_40_801010_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("efficient_mlm_m0_40_801010_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|efficient_mlm_m0_40_801010_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|845.4 MB| + +## References + +https://huggingface.co/princeton-nlp/efficient_mlm_m0.40-801010 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_pipeline_en.md new file mode 100644 index 00000000000000..6d4d4c1b390e15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_pipeline_en_5.5.0_3.0_1726470578327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_pipeline_en_5.5.0_3.0_1726470578327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random1_seed2-twitter-roberta-base-2021-124m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-english_tonga_tonga_islands_arabic_version3_en.md b/docs/_posts/ahmedlone127/2024-09-16-english_tonga_tonga_islands_arabic_version3_en.md new file mode 100644 index 00000000000000..ec0ed874b6fbaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-english_tonga_tonga_islands_arabic_version3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English english_tonga_tonga_islands_arabic_version3 MarianTransformer from wingo-dz +author: John Snow Labs +name: english_tonga_tonga_islands_arabic_version3 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`english_tonga_tonga_islands_arabic_version3` is a English model originally trained by wingo-dz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/english_tonga_tonga_islands_arabic_version3_en_5.5.0_3.0_1726503080795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/english_tonga_tonga_islands_arabic_version3_en_5.5.0_3.0_1726503080795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("english_tonga_tonga_islands_arabic_version3","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("english_tonga_tonga_islands_arabic_version3","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|english_tonga_tonga_islands_arabic_version3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|528.2 MB| + +## References + +https://huggingface.co/wingo-dz/en-to-ar-version3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-english_tonga_tonga_islands_arabic_version3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-english_tonga_tonga_islands_arabic_version3_pipeline_en.md new file mode 100644 index 00000000000000..23623b23a7850d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-english_tonga_tonga_islands_arabic_version3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English english_tonga_tonga_islands_arabic_version3_pipeline pipeline MarianTransformer from wingo-dz +author: John Snow Labs +name: english_tonga_tonga_islands_arabic_version3_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`english_tonga_tonga_islands_arabic_version3_pipeline` is a English model originally trained by wingo-dz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/english_tonga_tonga_islands_arabic_version3_pipeline_en_5.5.0_3.0_1726503106089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/english_tonga_tonga_islands_arabic_version3_pipeline_en_5.5.0_3.0_1726503106089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("english_tonga_tonga_islands_arabic_version3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("english_tonga_tonga_islands_arabic_version3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|english_tonga_tonga_islands_arabic_version3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|528.8 MB| + +## References + +https://huggingface.co/wingo-dz/en-to-ar-version3 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-enlm_roberta_conll2003_final_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-enlm_roberta_conll2003_final_pipeline_en.md new file mode 100644 index 00000000000000..8c751b1e7ed5c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-enlm_roberta_conll2003_final_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English enlm_roberta_conll2003_final_pipeline pipeline XlmRoBertaForTokenClassification from manirai91 +author: John Snow Labs +name: enlm_roberta_conll2003_final_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enlm_roberta_conll2003_final_pipeline` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enlm_roberta_conll2003_final_pipeline_en_5.5.0_3.0_1726495986638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enlm_roberta_conll2003_final_pipeline_en_5.5.0_3.0_1726495986638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("enlm_roberta_conll2003_final_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("enlm_roberta_conll2003_final_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enlm_roberta_conll2003_final_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.4 MB| + +## References + +https://huggingface.co/manirai91/enlm-roberta-conll2003-final + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-epc_adept_roberta_l_all_en.md b/docs/_posts/ahmedlone127/2024-09-16-epc_adept_roberta_l_all_en.md new file mode 100644 index 00000000000000..111c59037898bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-epc_adept_roberta_l_all_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English epc_adept_roberta_l_all RoBertaForSequenceClassification from veronica320 +author: John Snow Labs +name: epc_adept_roberta_l_all +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`epc_adept_roberta_l_all` is a English model originally trained by veronica320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/epc_adept_roberta_l_all_en_5.5.0_3.0_1726527015402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/epc_adept_roberta_l_all_en_5.5.0_3.0_1726527015402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("epc_adept_roberta_l_all","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("epc_adept_roberta_l_all", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|epc_adept_roberta_l_all| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/veronica320/EPC_ADEPT_roberta-l_all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-epc_adept_roberta_l_all_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-epc_adept_roberta_l_all_pipeline_en.md new file mode 100644 index 00000000000000..0458a157d19771 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-epc_adept_roberta_l_all_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English epc_adept_roberta_l_all_pipeline pipeline RoBertaForSequenceClassification from veronica320 +author: John Snow Labs +name: epc_adept_roberta_l_all_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`epc_adept_roberta_l_all_pipeline` is a English model originally trained by veronica320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/epc_adept_roberta_l_all_pipeline_en_5.5.0_3.0_1726527095417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/epc_adept_roberta_l_all_pipeline_en_5.5.0_3.0_1726527095417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("epc_adept_roberta_l_all_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("epc_adept_roberta_l_all_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|epc_adept_roberta_l_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/veronica320/EPC_ADEPT_roberta-l_all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-esperberto_ebtihal_en.md b/docs/_posts/ahmedlone127/2024-09-16-esperberto_ebtihal_en.md new file mode 100644 index 00000000000000..198808431dfc56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-esperberto_ebtihal_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English esperberto_ebtihal RoBertaEmbeddings from Ebtihal +author: John Snow Labs +name: esperberto_ebtihal +date: 2024-09-16 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`esperberto_ebtihal` is a English model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/esperberto_ebtihal_en_5.5.0_3.0_1726513946631.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/esperberto_ebtihal_en_5.5.0_3.0_1726513946631.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("esperberto_ebtihal","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("esperberto_ebtihal","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|esperberto_ebtihal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.7 MB| + +## References + +https://huggingface.co/Ebtihal/EsperBERTo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-fake_news_detection_sonatafyai_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-fake_news_detection_sonatafyai_roberta_pipeline_en.md new file mode 100644 index 00000000000000..6951d2a81b4712 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-fake_news_detection_sonatafyai_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fake_news_detection_sonatafyai_roberta_pipeline pipeline RoBertaForSequenceClassification from Sonatafyai +author: John Snow Labs +name: fake_news_detection_sonatafyai_roberta_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news_detection_sonatafyai_roberta_pipeline` is a English model originally trained by Sonatafyai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_detection_sonatafyai_roberta_pipeline_en_5.5.0_3.0_1726470196588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_detection_sonatafyai_roberta_pipeline_en_5.5.0_3.0_1726470196588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fake_news_detection_sonatafyai_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fake_news_detection_sonatafyai_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news_detection_sonatafyai_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/Sonatafyai/Fake_news_Detection_SonatafyAI_RoBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-financial_sentiment_analysis_shaivn_en.md b/docs/_posts/ahmedlone127/2024-09-16-financial_sentiment_analysis_shaivn_en.md new file mode 100644 index 00000000000000..493d63cadcac16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-financial_sentiment_analysis_shaivn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English financial_sentiment_analysis_shaivn BertForSequenceClassification from Shaivn +author: John Snow Labs +name: financial_sentiment_analysis_shaivn +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_sentiment_analysis_shaivn` is a English model originally trained by Shaivn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_sentiment_analysis_shaivn_en_5.5.0_3.0_1726498910014.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_sentiment_analysis_shaivn_en_5.5.0_3.0_1726498910014.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("financial_sentiment_analysis_shaivn","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("financial_sentiment_analysis_shaivn", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_sentiment_analysis_shaivn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Shaivn/Financial-Sentiment-Analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-fine_tune_whisper_small_inayat_en.md b/docs/_posts/ahmedlone127/2024-09-16-fine_tune_whisper_small_inayat_en.md new file mode 100644 index 00000000000000..aa9dd8aadaeadf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-fine_tune_whisper_small_inayat_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English fine_tune_whisper_small_inayat WhisperForCTC from Inayat +author: John Snow Labs +name: fine_tune_whisper_small_inayat +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tune_whisper_small_inayat` is a English model originally trained by Inayat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tune_whisper_small_inayat_en_5.5.0_3.0_1726477282678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tune_whisper_small_inayat_en_5.5.0_3.0_1726477282678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("fine_tune_whisper_small_inayat","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("fine_tune_whisper_small_inayat", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tune_whisper_small_inayat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Inayat/Fine_tune_whisper_small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_ariabert_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_ariabert_pipeline_fa.md new file mode 100644 index 00000000000000..f8cdc4cbe8ba93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_ariabert_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian fine_tuned_ariabert_pipeline pipeline RoBertaForSequenceClassification from iMahdiGhazavi +author: John Snow Labs +name: fine_tuned_ariabert_pipeline +date: 2024-09-16 +tags: [fa, open_source, pipeline, onnx] +task: Text Classification +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_ariabert_pipeline` is a Persian model originally trained by iMahdiGhazavi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_ariabert_pipeline_fa_5.5.0_3.0_1726527714607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_ariabert_pipeline_fa_5.5.0_3.0_1726527714607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_ariabert_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_ariabert_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_ariabert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|492.0 MB| + +## References + +https://huggingface.co/iMahdiGhazavi/fine-tuned-AriaBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_helsinki_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_helsinki_model_pipeline_en.md new file mode 100644 index 00000000000000..ba9e3c1c477038 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_helsinki_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fine_tuned_helsinki_model_pipeline pipeline MarianTransformer from EricPeter +author: John Snow Labs +name: fine_tuned_helsinki_model_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_helsinki_model_pipeline` is a English model originally trained by EricPeter. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_helsinki_model_pipeline_en_5.5.0_3.0_1726491238732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_helsinki_model_pipeline_en_5.5.0_3.0_1726491238732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_helsinki_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_helsinki_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_helsinki_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|530.4 MB| + +## References + +https://huggingface.co/EricPeter/fine_tuned_helsinki_model + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_roberta_xlarge_en.md b/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_roberta_xlarge_en.md new file mode 100644 index 00000000000000..4b08e8d544fca1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_roberta_xlarge_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English fine_tuned_roberta_xlarge BertForQuestionAnswering from kiwakwok +author: John Snow Labs +name: fine_tuned_roberta_xlarge +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_roberta_xlarge` is a English model originally trained by kiwakwok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_xlarge_en_5.5.0_3.0_1726507697876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_xlarge_en_5.5.0_3.0_1726507697876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("fine_tuned_roberta_xlarge","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("fine_tuned_roberta_xlarge", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_roberta_xlarge| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|382.2 MB| + +## References + +https://huggingface.co/kiwakwok/fine-tuned-roberta-xlarge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_roberta_xlarge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_roberta_xlarge_pipeline_en.md new file mode 100644 index 00000000000000..48719b514acb2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_roberta_xlarge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English fine_tuned_roberta_xlarge_pipeline pipeline BertForQuestionAnswering from kiwakwok +author: John Snow Labs +name: fine_tuned_roberta_xlarge_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_roberta_xlarge_pipeline` is a English model originally trained by kiwakwok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_xlarge_pipeline_en_5.5.0_3.0_1726507715902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_xlarge_pipeline_en_5.5.0_3.0_1726507715902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_roberta_xlarge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_roberta_xlarge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_roberta_xlarge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|382.3 MB| + +## References + +https://huggingface.co/kiwakwok/fine-tuned-roberta-xlarge + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-finetuned_iitp_pdt_review_roberta_hinglish_big_en.md b/docs/_posts/ahmedlone127/2024-09-16-finetuned_iitp_pdt_review_roberta_hinglish_big_en.md new file mode 100644 index 00000000000000..bde04135a5da88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-finetuned_iitp_pdt_review_roberta_hinglish_big_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_iitp_pdt_review_roberta_hinglish_big RoBertaForSequenceClassification from aditeyabaral +author: John Snow Labs +name: finetuned_iitp_pdt_review_roberta_hinglish_big +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_iitp_pdt_review_roberta_hinglish_big` is a English model originally trained by aditeyabaral. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_iitp_pdt_review_roberta_hinglish_big_en_5.5.0_3.0_1726470845801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_iitp_pdt_review_roberta_hinglish_big_en_5.5.0_3.0_1726470845801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_iitp_pdt_review_roberta_hinglish_big","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_iitp_pdt_review_roberta_hinglish_big", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_iitp_pdt_review_roberta_hinglish_big| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|251.5 MB| + +## References + +https://huggingface.co/aditeyabaral/finetuned-iitp_pdt_review-roberta-hinglish-big \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-finetuned_opusmt_english_french_hindi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-finetuned_opusmt_english_french_hindi_pipeline_en.md new file mode 100644 index 00000000000000..67b3feb52ec9cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-finetuned_opusmt_english_french_hindi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_opusmt_english_french_hindi_pipeline pipeline MarianTransformer from ritika-kumar +author: John Snow Labs +name: finetuned_opusmt_english_french_hindi_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_opusmt_english_french_hindi_pipeline` is a English model originally trained by ritika-kumar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_opusmt_english_french_hindi_pipeline_en_5.5.0_3.0_1726509774154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_opusmt_english_french_hindi_pipeline_en_5.5.0_3.0_1726509774154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_opusmt_english_french_hindi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_opusmt_english_french_hindi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_opusmt_english_french_hindi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.9 MB| + +## References + +https://huggingface.co/ritika-kumar/finetuned-opusmt-en-fr-hi + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-finetuned_sentiment_analysis_modell_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-finetuned_sentiment_analysis_modell_pipeline_en.md new file mode 100644 index 00000000000000..f836df0e95155f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-finetuned_sentiment_analysis_modell_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_sentiment_analysis_modell_pipeline pipeline RoBertaForSequenceClassification from HerbertAIHug +author: John Snow Labs +name: finetuned_sentiment_analysis_modell_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sentiment_analysis_modell_pipeline` is a English model originally trained by HerbertAIHug. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_analysis_modell_pipeline_en_5.5.0_3.0_1726519059239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_analysis_modell_pipeline_en_5.5.0_3.0_1726519059239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_sentiment_analysis_modell_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_sentiment_analysis_modell_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sentiment_analysis_modell_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.8 MB| + +## References + +https://huggingface.co/HerbertAIHug/finetuned_sentiment_analysis_modell + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-finetuning_sentiment_model_3000_samples_halvion_en.md b/docs/_posts/ahmedlone127/2024-09-16-finetuning_sentiment_model_3000_samples_halvion_en.md new file mode 100644 index 00000000000000..4e620e438ba612 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-finetuning_sentiment_model_3000_samples_halvion_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_halvion DistilBertForSequenceClassification from halvion +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_halvion +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_halvion` is a English model originally trained by halvion. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_halvion_en_5.5.0_3.0_1726506416126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_halvion_en_5.5.0_3.0_1726506416126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_halvion","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_halvion", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_halvion| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/halvion/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-finetuning_sentiment_model_3000_samples_samsellj_en.md b/docs/_posts/ahmedlone127/2024-09-16-finetuning_sentiment_model_3000_samples_samsellj_en.md new file mode 100644 index 00000000000000..e7c4098e0cf33d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-finetuning_sentiment_model_3000_samples_samsellj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_samsellj DistilBertForSequenceClassification from samsellj +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_samsellj +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_samsellj` is a English model originally trained by samsellj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_samsellj_en_5.5.0_3.0_1726525478202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_samsellj_en_5.5.0_3.0_1726525478202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_samsellj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_samsellj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_samsellj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/samsellj/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-focaltrain_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-focaltrain_pipeline_en.md new file mode 100644 index 00000000000000..008e7d4c345a95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-focaltrain_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English focaltrain_pipeline pipeline RoBertaForSequenceClassification from V12X-ksr +author: John Snow Labs +name: focaltrain_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`focaltrain_pipeline` is a English model originally trained by V12X-ksr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/focaltrain_pipeline_en_5.5.0_3.0_1726518003597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/focaltrain_pipeline_en_5.5.0_3.0_1726518003597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("focaltrain_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("focaltrain_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|focaltrain_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|437.1 MB| + +## References + +https://huggingface.co/V12X-ksr/FOCALtrain + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-ft_distilbert_base_uncased_with_squad_en.md b/docs/_posts/ahmedlone127/2024-09-16-ft_distilbert_base_uncased_with_squad_en.md new file mode 100644 index 00000000000000..59fccf78746357 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-ft_distilbert_base_uncased_with_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English ft_distilbert_base_uncased_with_squad DistilBertForQuestionAnswering from aisuko +author: John Snow Labs +name: ft_distilbert_base_uncased_with_squad +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_distilbert_base_uncased_with_squad` is a English model originally trained by aisuko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_distilbert_base_uncased_with_squad_en_5.5.0_3.0_1726469765473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_distilbert_base_uncased_with_squad_en_5.5.0_3.0_1726469765473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("ft_distilbert_base_uncased_with_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("ft_distilbert_base_uncased_with_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_distilbert_base_uncased_with_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/aisuko/ft-distilbert-base-uncased-with-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-ft_distilbert_base_uncased_with_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-ft_distilbert_base_uncased_with_squad_pipeline_en.md new file mode 100644 index 00000000000000..1ac22f52964a0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-ft_distilbert_base_uncased_with_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English ft_distilbert_base_uncased_with_squad_pipeline pipeline DistilBertForQuestionAnswering from aisuko +author: John Snow Labs +name: ft_distilbert_base_uncased_with_squad_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_distilbert_base_uncased_with_squad_pipeline` is a English model originally trained by aisuko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_distilbert_base_uncased_with_squad_pipeline_en_5.5.0_3.0_1726469777151.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_distilbert_base_uncased_with_squad_pipeline_en_5.5.0_3.0_1726469777151.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ft_distilbert_base_uncased_with_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ft_distilbert_base_uncased_with_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_distilbert_base_uncased_with_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/aisuko/ft-distilbert-base-uncased-with-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-furina_seed42_eng_amh_hau_basic_5e_06_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-furina_seed42_eng_amh_hau_basic_5e_06_pipeline_en.md new file mode 100644 index 00000000000000..7bd401255cb90b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-furina_seed42_eng_amh_hau_basic_5e_06_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English furina_seed42_eng_amh_hau_basic_5e_06_pipeline pipeline XlmRoBertaForSequenceClassification from Shijia +author: John Snow Labs +name: furina_seed42_eng_amh_hau_basic_5e_06_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina_seed42_eng_amh_hau_basic_5e_06_pipeline` is a English model originally trained by Shijia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_amh_hau_basic_5e_06_pipeline_en_5.5.0_3.0_1726517698909.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_amh_hau_basic_5e_06_pipeline_en_5.5.0_3.0_1726517698909.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("furina_seed42_eng_amh_hau_basic_5e_06_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("furina_seed42_eng_amh_hau_basic_5e_06_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina_seed42_eng_amh_hau_basic_5e_06_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/Shijia/furina_seed42_eng_amh_hau_basic_5e-06 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-g_bert_ner_en.md b/docs/_posts/ahmedlone127/2024-09-16-g_bert_ner_en.md new file mode 100644 index 00000000000000..c4f2edc6cc31b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-g_bert_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English g_bert_ner BertForTokenClassification from eddiegulay +author: John Snow Labs +name: g_bert_ner +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`g_bert_ner` is a English model originally trained by eddiegulay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/g_bert_ner_en_5.5.0_3.0_1726461602692.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/g_bert_ner_en_5.5.0_3.0_1726461602692.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("g_bert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("g_bert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|g_bert_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/eddiegulay/g-bert-NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-gal_ner_iwcg_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-gal_ner_iwcg_6_pipeline_en.md new file mode 100644 index 00000000000000..2d2835ea360278 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-gal_ner_iwcg_6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gal_ner_iwcg_6_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_ner_iwcg_6_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ner_iwcg_6_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ner_iwcg_6_pipeline_en_5.5.0_3.0_1726497324150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ner_iwcg_6_pipeline_en_5.5.0_3.0_1726497324150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_ner_iwcg_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_ner_iwcg_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ner_iwcg_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/homersimpson/gal-ner-iwcg-6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-helper3_en.md b/docs/_posts/ahmedlone127/2024-09-16-helper3_en.md new file mode 100644 index 00000000000000..2d922960c406cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-helper3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English helper3 RoBertaForSequenceClassification from raima2001 +author: John Snow Labs +name: helper3 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helper3` is a English model originally trained by raima2001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helper3_en_5.5.0_3.0_1726455347049.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helper3_en_5.5.0_3.0_1726455347049.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("helper3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("helper3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helper3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|85.0 MB| + +## References + +https://huggingface.co/raima2001/helper3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-helper3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-helper3_pipeline_en.md new file mode 100644 index 00000000000000..d339caac5f05da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-helper3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English helper3_pipeline pipeline RoBertaForSequenceClassification from raima2001 +author: John Snow Labs +name: helper3_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helper3_pipeline` is a English model originally trained by raima2001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helper3_pipeline_en_5.5.0_3.0_1726455351280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helper3_pipeline_en_5.5.0_3.0_1726455351280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("helper3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("helper3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helper3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|85.0 MB| + +## References + +https://huggingface.co/raima2001/helper3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-iad_hw_nlp_en.md b/docs/_posts/ahmedlone127/2024-09-16-iad_hw_nlp_en.md new file mode 100644 index 00000000000000..faf43b854b7206 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-iad_hw_nlp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English iad_hw_nlp MarianTransformer from kokarevklepa +author: John Snow Labs +name: iad_hw_nlp +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`iad_hw_nlp` is a English model originally trained by kokarevklepa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/iad_hw_nlp_en_5.5.0_3.0_1726491656796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/iad_hw_nlp_en_5.5.0_3.0_1726491656796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("iad_hw_nlp","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("iad_hw_nlp","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|iad_hw_nlp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|524.3 MB| + +## References + +https://huggingface.co/kokarevklepa/iad_hw_nlp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-iit_token_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-iit_token_pipeline_en.md new file mode 100644 index 00000000000000..9a13c5f7e54ad8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-iit_token_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English iit_token_pipeline pipeline DistilBertForQuestionAnswering from teju-1210 +author: John Snow Labs +name: iit_token_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`iit_token_pipeline` is a English model originally trained by teju-1210. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/iit_token_pipeline_en_5.5.0_3.0_1726515098712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/iit_token_pipeline_en_5.5.0_3.0_1726515098712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("iit_token_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("iit_token_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|iit_token_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/teju-1210/IIT_Token + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-imdb_distilbert_funetuned_en.md b/docs/_posts/ahmedlone127/2024-09-16-imdb_distilbert_funetuned_en.md new file mode 100644 index 00000000000000..22709636287db3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-imdb_distilbert_funetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdb_distilbert_funetuned DistilBertForSequenceClassification from ashaduzzaman +author: John Snow Labs +name: imdb_distilbert_funetuned +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_distilbert_funetuned` is a English model originally trained by ashaduzzaman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_distilbert_funetuned_en_5.5.0_3.0_1726506163179.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_distilbert_funetuned_en_5.5.0_3.0_1726506163179.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb_distilbert_funetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb_distilbert_funetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_distilbert_funetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ashaduzzaman/imdb-distilbert-funetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-imdb_distilbert_raj_sharma_en.md b/docs/_posts/ahmedlone127/2024-09-16-imdb_distilbert_raj_sharma_en.md new file mode 100644 index 00000000000000..4358aeac16555f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-imdb_distilbert_raj_sharma_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdb_distilbert_raj_sharma DistilBertForSequenceClassification from Raj-Sharma +author: John Snow Labs +name: imdb_distilbert_raj_sharma +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_distilbert_raj_sharma` is a English model originally trained by Raj-Sharma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_distilbert_raj_sharma_en_5.5.0_3.0_1726506668151.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_distilbert_raj_sharma_en_5.5.0_3.0_1726506668151.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb_distilbert_raj_sharma","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb_distilbert_raj_sharma", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_distilbert_raj_sharma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Raj-Sharma/imdb-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-imdb_score_predict_roberta_large_fulldata_en.md b/docs/_posts/ahmedlone127/2024-09-16-imdb_score_predict_roberta_large_fulldata_en.md new file mode 100644 index 00000000000000..90b085e492784c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-imdb_score_predict_roberta_large_fulldata_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdb_score_predict_roberta_large_fulldata RoBertaForSequenceClassification from Mathnub +author: John Snow Labs +name: imdb_score_predict_roberta_large_fulldata +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_score_predict_roberta_large_fulldata` is a English model originally trained by Mathnub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_score_predict_roberta_large_fulldata_en_5.5.0_3.0_1726518271129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_score_predict_roberta_large_fulldata_en_5.5.0_3.0_1726518271129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("imdb_score_predict_roberta_large_fulldata","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("imdb_score_predict_roberta_large_fulldata", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_score_predict_roberta_large_fulldata| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Mathnub/imdb-score-predict-roberta-large-fulldata \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-imdb_score_predict_roberta_large_fulldata_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-imdb_score_predict_roberta_large_fulldata_pipeline_en.md new file mode 100644 index 00000000000000..8b1b9563f5b1c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-imdb_score_predict_roberta_large_fulldata_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdb_score_predict_roberta_large_fulldata_pipeline pipeline RoBertaForSequenceClassification from Mathnub +author: John Snow Labs +name: imdb_score_predict_roberta_large_fulldata_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_score_predict_roberta_large_fulldata_pipeline` is a English model originally trained by Mathnub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_score_predict_roberta_large_fulldata_pipeline_en_5.5.0_3.0_1726518340716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_score_predict_roberta_large_fulldata_pipeline_en_5.5.0_3.0_1726518340716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdb_score_predict_roberta_large_fulldata_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdb_score_predict_roberta_large_fulldata_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_score_predict_roberta_large_fulldata_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Mathnub/imdb-score-predict-roberta-large-fulldata + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-immigration_disbalanced_classifier_german_en.md b/docs/_posts/ahmedlone127/2024-09-16-immigration_disbalanced_classifier_german_en.md new file mode 100644 index 00000000000000..86612f79f0c3fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-immigration_disbalanced_classifier_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English immigration_disbalanced_classifier_german XlmRoBertaForSequenceClassification from luissattelmayer +author: John Snow Labs +name: immigration_disbalanced_classifier_german +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`immigration_disbalanced_classifier_german` is a English model originally trained by luissattelmayer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/immigration_disbalanced_classifier_german_en_5.5.0_3.0_1726517112645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/immigration_disbalanced_classifier_german_en_5.5.0_3.0_1726517112645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("immigration_disbalanced_classifier_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("immigration_disbalanced_classifier_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|immigration_disbalanced_classifier_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|806.3 MB| + +## References + +https://huggingface.co/luissattelmayer/immigration_disbalanced_classifier_german \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-immigration_disbalanced_classifier_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-immigration_disbalanced_classifier_german_pipeline_en.md new file mode 100644 index 00000000000000..137c27c5e47686 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-immigration_disbalanced_classifier_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English immigration_disbalanced_classifier_german_pipeline pipeline XlmRoBertaForSequenceClassification from luissattelmayer +author: John Snow Labs +name: immigration_disbalanced_classifier_german_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`immigration_disbalanced_classifier_german_pipeline` is a English model originally trained by luissattelmayer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/immigration_disbalanced_classifier_german_pipeline_en_5.5.0_3.0_1726517235915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/immigration_disbalanced_classifier_german_pipeline_en_5.5.0_3.0_1726517235915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("immigration_disbalanced_classifier_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("immigration_disbalanced_classifier_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|immigration_disbalanced_classifier_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|806.3 MB| + +## References + +https://huggingface.co/luissattelmayer/immigration_disbalanced_classifier_german + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-indicbertv2_mlm_only_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-16-indicbertv2_mlm_only_pipeline_xx.md new file mode 100644 index 00000000000000..1c85bc05b9c3a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-indicbertv2_mlm_only_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual indicbertv2_mlm_only_pipeline pipeline BertEmbeddings from ai4bharat +author: John Snow Labs +name: indicbertv2_mlm_only_pipeline +date: 2024-09-16 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indicbertv2_mlm_only_pipeline` is a Multilingual model originally trained by ai4bharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indicbertv2_mlm_only_pipeline_xx_5.5.0_3.0_1726464181940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indicbertv2_mlm_only_pipeline_xx_5.5.0_3.0_1726464181940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("indicbertv2_mlm_only_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("indicbertv2_mlm_only_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indicbertv2_mlm_only_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/ai4bharat/IndicBERTv2-MLM-only + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-indicbertv2_mlm_only_xx.md b/docs/_posts/ahmedlone127/2024-09-16-indicbertv2_mlm_only_xx.md new file mode 100644 index 00000000000000..dc93ccc6c31017 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-indicbertv2_mlm_only_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual indicbertv2_mlm_only BertEmbeddings from ai4bharat +author: John Snow Labs +name: indicbertv2_mlm_only +date: 2024-09-16 +tags: [xx, open_source, onnx, embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indicbertv2_mlm_only` is a Multilingual model originally trained by ai4bharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indicbertv2_mlm_only_xx_5.5.0_3.0_1726464136013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indicbertv2_mlm_only_xx_5.5.0_3.0_1726464136013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("indicbertv2_mlm_only","xx") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("indicbertv2_mlm_only","xx") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indicbertv2_mlm_only| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/ai4bharat/IndicBERTv2-MLM-only \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-indobert_base_indonesian_emotion_analysis_v11_id.md b/docs/_posts/ahmedlone127/2024-09-16-indobert_base_indonesian_emotion_analysis_v11_id.md new file mode 100644 index 00000000000000..4db10dfd750532 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-indobert_base_indonesian_emotion_analysis_v11_id.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Indonesian indobert_base_indonesian_emotion_analysis_v11 BertForSequenceClassification from Aardiiiiy +author: John Snow Labs +name: indobert_base_indonesian_emotion_analysis_v11 +date: 2024-09-16 +tags: [id, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indobert_base_indonesian_emotion_analysis_v11` is a Indonesian model originally trained by Aardiiiiy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indobert_base_indonesian_emotion_analysis_v11_id_5.5.0_3.0_1726499496333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indobert_base_indonesian_emotion_analysis_v11_id_5.5.0_3.0_1726499496333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("indobert_base_indonesian_emotion_analysis_v11","id") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("indobert_base_indonesian_emotion_analysis_v11", "id") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indobert_base_indonesian_emotion_analysis_v11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|id| +|Size:|414.0 MB| + +## References + +https://huggingface.co/Aardiiiiy/indobert-base-Indonesian-emotion-analysis-v11 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-indobert_base_indonesian_emotion_analysis_v11_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-16-indobert_base_indonesian_emotion_analysis_v11_pipeline_id.md new file mode 100644 index 00000000000000..34340d796034de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-indobert_base_indonesian_emotion_analysis_v11_pipeline_id.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Indonesian indobert_base_indonesian_emotion_analysis_v11_pipeline pipeline BertForSequenceClassification from Aardiiiiy +author: John Snow Labs +name: indobert_base_indonesian_emotion_analysis_v11_pipeline +date: 2024-09-16 +tags: [id, open_source, pipeline, onnx] +task: Text Classification +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indobert_base_indonesian_emotion_analysis_v11_pipeline` is a Indonesian model originally trained by Aardiiiiy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indobert_base_indonesian_emotion_analysis_v11_pipeline_id_5.5.0_3.0_1726499515189.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indobert_base_indonesian_emotion_analysis_v11_pipeline_id_5.5.0_3.0_1726499515189.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("indobert_base_indonesian_emotion_analysis_v11_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("indobert_base_indonesian_emotion_analysis_v11_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indobert_base_indonesian_emotion_analysis_v11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|414.1 MB| + +## References + +https://huggingface.co/Aardiiiiy/indobert-base-Indonesian-emotion-analysis-v11 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-indonesia_emotion_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-indonesia_emotion_roberta_pipeline_en.md new file mode 100644 index 00000000000000..c79016ce2aaf90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-indonesia_emotion_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English indonesia_emotion_roberta_pipeline pipeline RoBertaForSequenceClassification from aniyfans +author: John Snow Labs +name: indonesia_emotion_roberta_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indonesia_emotion_roberta_pipeline` is a English model originally trained by aniyfans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indonesia_emotion_roberta_pipeline_en_5.5.0_3.0_1726518684112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indonesia_emotion_roberta_pipeline_en_5.5.0_3.0_1726518684112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("indonesia_emotion_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("indonesia_emotion_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indonesia_emotion_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/aniyfans/indonesia-emotion-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-indonesian_sentiment_id.md b/docs/_posts/ahmedlone127/2024-09-16-indonesian_sentiment_id.md new file mode 100644 index 00000000000000..ecd5ef182e58b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-indonesian_sentiment_id.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Indonesian indonesian_sentiment BertForSequenceClassification from taufiqdp +author: John Snow Labs +name: indonesian_sentiment +date: 2024-09-16 +tags: [id, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indonesian_sentiment` is a Indonesian model originally trained by taufiqdp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indonesian_sentiment_id_5.5.0_3.0_1726493102587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indonesian_sentiment_id_5.5.0_3.0_1726493102587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("indonesian_sentiment","id") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("indonesian_sentiment", "id") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indonesian_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|id| +|Size:|413.9 MB| + +## References + +https://huggingface.co/taufiqdp/indonesian-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-indonesian_sentiment_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-16-indonesian_sentiment_pipeline_id.md new file mode 100644 index 00000000000000..707886a4e3e007 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-indonesian_sentiment_pipeline_id.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Indonesian indonesian_sentiment_pipeline pipeline BertForSequenceClassification from taufiqdp +author: John Snow Labs +name: indonesian_sentiment_pipeline +date: 2024-09-16 +tags: [id, open_source, pipeline, onnx] +task: Text Classification +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indonesian_sentiment_pipeline` is a Indonesian model originally trained by taufiqdp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indonesian_sentiment_pipeline_id_5.5.0_3.0_1726493122740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indonesian_sentiment_pipeline_id_5.5.0_3.0_1726493122740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("indonesian_sentiment_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("indonesian_sentiment_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indonesian_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|414.0 MB| + +## References + +https://huggingface.co/taufiqdp/indonesian-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-irf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-irf_pipeline_en.md new file mode 100644 index 00000000000000..f3e2f0b758d4e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-irf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English irf_pipeline pipeline BertForSequenceClassification from Tianlin668 +author: John Snow Labs +name: irf_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`irf_pipeline` is a English model originally trained by Tianlin668. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/irf_pipeline_en_5.5.0_3.0_1726459076814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/irf_pipeline_en_5.5.0_3.0_1726459076814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("irf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("irf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|irf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.9 MB| + +## References + +https://huggingface.co/Tianlin668/Irf + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-journal_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-journal_roberta_pipeline_en.md new file mode 100644 index 00000000000000..3d1d324cf9a5da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-journal_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English journal_roberta_pipeline pipeline RoBertaForSequenceClassification from KM4STfulltext +author: John Snow Labs +name: journal_roberta_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`journal_roberta_pipeline` is a English model originally trained by KM4STfulltext. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/journal_roberta_pipeline_en_5.5.0_3.0_1726470368824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/journal_roberta_pipeline_en_5.5.0_3.0_1726470368824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("journal_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("journal_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|journal_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.0 MB| + +## References + +https://huggingface.co/KM4STfulltext/Journal-RoBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-kemmer_finetuned_russian_english_nan.md b/docs/_posts/ahmedlone127/2024-09-16-kemmer_finetuned_russian_english_nan.md new file mode 100644 index 00000000000000..ba68003ef991aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-kemmer_finetuned_russian_english_nan.md @@ -0,0 +1,94 @@ +--- +layout: model +title: None kemmer_finetuned_russian_english MarianTransformer from KemmerEdition +author: John Snow Labs +name: kemmer_finetuned_russian_english +date: 2024-09-16 +tags: [nan, open_source, onnx, translation, marian] +task: Translation +language: nan +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kemmer_finetuned_russian_english` is a None model originally trained by KemmerEdition. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kemmer_finetuned_russian_english_nan_5.5.0_3.0_1726457023793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kemmer_finetuned_russian_english_nan_5.5.0_3.0_1726457023793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("kemmer_finetuned_russian_english","nan") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("kemmer_finetuned_russian_english","nan") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kemmer_finetuned_russian_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|nan| +|Size:|526.3 MB| + +## References + +https://huggingface.co/KemmerEdition/Kemmer_Finetuned_Ru_En \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-khipu_finetuned_amazon_reviews_multi_gcapde_en.md b/docs/_posts/ahmedlone127/2024-09-16-khipu_finetuned_amazon_reviews_multi_gcapde_en.md new file mode 100644 index 00000000000000..e6c13686d38d51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-khipu_finetuned_amazon_reviews_multi_gcapde_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English khipu_finetuned_amazon_reviews_multi_gcapde RoBertaForSequenceClassification from gcapde +author: John Snow Labs +name: khipu_finetuned_amazon_reviews_multi_gcapde +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khipu_finetuned_amazon_reviews_multi_gcapde` is a English model originally trained by gcapde. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khipu_finetuned_amazon_reviews_multi_gcapde_en_5.5.0_3.0_1726470195953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khipu_finetuned_amazon_reviews_multi_gcapde_en_5.5.0_3.0_1726470195953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("khipu_finetuned_amazon_reviews_multi_gcapde","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("khipu_finetuned_amazon_reviews_multi_gcapde", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khipu_finetuned_amazon_reviews_multi_gcapde| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|428.7 MB| + +## References + +https://huggingface.co/gcapde/khipu-finetuned-amazon_reviews_multi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-khipu_finetuned_amazon_reviews_multi_gcapde_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-khipu_finetuned_amazon_reviews_multi_gcapde_pipeline_en.md new file mode 100644 index 00000000000000..956e0770d7efde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-khipu_finetuned_amazon_reviews_multi_gcapde_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English khipu_finetuned_amazon_reviews_multi_gcapde_pipeline pipeline RoBertaForSequenceClassification from gcapde +author: John Snow Labs +name: khipu_finetuned_amazon_reviews_multi_gcapde_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khipu_finetuned_amazon_reviews_multi_gcapde_pipeline` is a English model originally trained by gcapde. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khipu_finetuned_amazon_reviews_multi_gcapde_pipeline_en_5.5.0_3.0_1726470223802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khipu_finetuned_amazon_reviews_multi_gcapde_pipeline_en_5.5.0_3.0_1726470223802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("khipu_finetuned_amazon_reviews_multi_gcapde_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("khipu_finetuned_amazon_reviews_multi_gcapde_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khipu_finetuned_amazon_reviews_multi_gcapde_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.8 MB| + +## References + +https://huggingface.co/gcapde/khipu-finetuned-amazon_reviews_multi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_en.md b/docs/_posts/ahmedlone127/2024-09-16-kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_en.md new file mode 100644 index 00000000000000..b1a13b0f3e0cc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3 RoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_en_5.5.0_3.0_1726526903900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_en_5.5.0_3.0_1726526903900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.3 MB| + +## References + +https://huggingface.co/RogerB/kinyaRoberta-large-kinte-finetuned-kin-tweet-finetuned-kin-sent3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_pipeline_en.md new file mode 100644 index 00000000000000..c3abf138749865 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_pipeline pipeline RoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_pipeline_en_5.5.0_3.0_1726526923641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_pipeline_en_5.5.0_3.0_1726526923641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.3 MB| + +## References + +https://huggingface.co/RogerB/kinyaRoberta-large-kinte-finetuned-kin-tweet-finetuned-kin-sent3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-lab1_finetuning_den_sota_en.md b/docs/_posts/ahmedlone127/2024-09-16-lab1_finetuning_den_sota_en.md new file mode 100644 index 00000000000000..e038ab6ca71f57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-lab1_finetuning_den_sota_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lab1_finetuning_den_sota MarianTransformer from Den-sota +author: John Snow Labs +name: lab1_finetuning_den_sota +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_finetuning_den_sota` is a English model originally trained by Den-sota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_finetuning_den_sota_en_5.5.0_3.0_1726457393050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_finetuning_den_sota_en_5.5.0_3.0_1726457393050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("lab1_finetuning_den_sota","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("lab1_finetuning_den_sota","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_finetuning_den_sota| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/Den-sota/lab1_finetuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-legal_bert_base_filtered_cuad_en.md b/docs/_posts/ahmedlone127/2024-09-16-legal_bert_base_filtered_cuad_en.md new file mode 100644 index 00000000000000..8371401e49e29f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-legal_bert_base_filtered_cuad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English legal_bert_base_filtered_cuad BertForQuestionAnswering from alex-apostolo +author: John Snow Labs +name: legal_bert_base_filtered_cuad +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_bert_base_filtered_cuad` is a English model originally trained by alex-apostolo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_bert_base_filtered_cuad_en_5.5.0_3.0_1726511271299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_bert_base_filtered_cuad_en_5.5.0_3.0_1726511271299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("legal_bert_base_filtered_cuad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("legal_bert_base_filtered_cuad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_bert_base_filtered_cuad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/alex-apostolo/legal-bert-base-filtered-cuad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-legal_bert_base_filtered_cuad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-legal_bert_base_filtered_cuad_pipeline_en.md new file mode 100644 index 00000000000000..80de14e534759d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-legal_bert_base_filtered_cuad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English legal_bert_base_filtered_cuad_pipeline pipeline BertForQuestionAnswering from alex-apostolo +author: John Snow Labs +name: legal_bert_base_filtered_cuad_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_bert_base_filtered_cuad_pipeline` is a English model originally trained by alex-apostolo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_bert_base_filtered_cuad_pipeline_en_5.5.0_3.0_1726511289803.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_bert_base_filtered_cuad_pipeline_en_5.5.0_3.0_1726511289803.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("legal_bert_base_filtered_cuad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("legal_bert_base_filtered_cuad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_bert_base_filtered_cuad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/alex-apostolo/legal-bert-base-filtered-cuad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-lenu_portuguese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-lenu_portuguese_pipeline_en.md new file mode 100644 index 00000000000000..238efeab9b8c63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-lenu_portuguese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lenu_portuguese_pipeline pipeline BertForSequenceClassification from Sociovestix +author: John Snow Labs +name: lenu_portuguese_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenu_portuguese_pipeline` is a English model originally trained by Sociovestix. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenu_portuguese_pipeline_en_5.5.0_3.0_1726460046181.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenu_portuguese_pipeline_en_5.5.0_3.0_1726460046181.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lenu_portuguese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lenu_portuguese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenu_portuguese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|627.8 MB| + +## References + +https://huggingface.co/Sociovestix/lenu_PT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-lettuce_sayula_popoluca_english_mono_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-lettuce_sayula_popoluca_english_mono_pipeline_en.md new file mode 100644 index 00000000000000..877192fae4e16b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-lettuce_sayula_popoluca_english_mono_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lettuce_sayula_popoluca_english_mono_pipeline pipeline RoBertaForTokenClassification from pranaydeeps +author: John Snow Labs +name: lettuce_sayula_popoluca_english_mono_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lettuce_sayula_popoluca_english_mono_pipeline` is a English model originally trained by pranaydeeps. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lettuce_sayula_popoluca_english_mono_pipeline_en_5.5.0_3.0_1726452812239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lettuce_sayula_popoluca_english_mono_pipeline_en_5.5.0_3.0_1726452812239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lettuce_sayula_popoluca_english_mono_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lettuce_sayula_popoluca_english_mono_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lettuce_sayula_popoluca_english_mono_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|457.9 MB| + +## References + +https://huggingface.co/pranaydeeps/lettuce_pos_en_mono + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-log_sage_reward_model_en.md b/docs/_posts/ahmedlone127/2024-09-16-log_sage_reward_model_en.md new file mode 100644 index 00000000000000..6b06176813990e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-log_sage_reward_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English log_sage_reward_model DistilBertForSequenceClassification from IrwinD +author: John Snow Labs +name: log_sage_reward_model +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`log_sage_reward_model` is a English model originally trained by IrwinD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/log_sage_reward_model_en_5.5.0_3.0_1726525295966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/log_sage_reward_model_en_5.5.0_3.0_1726525295966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("log_sage_reward_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("log_sage_reward_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|log_sage_reward_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/IrwinD/log_sage_reward_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_german_gender_en.md b/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_german_gender_en.md new file mode 100644 index 00000000000000..797ed992ae3ea6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_german_gender_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English maltese_coref_english_german_gender MarianTransformer from nlphuji +author: John Snow Labs +name: maltese_coref_english_german_gender +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_coref_english_german_gender` is a English model originally trained by nlphuji. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_coref_english_german_gender_en_5.5.0_3.0_1726491170175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_coref_english_german_gender_en_5.5.0_3.0_1726491170175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("maltese_coref_english_german_gender","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("maltese_coref_english_german_gender","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_coref_english_german_gender| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|499.8 MB| + +## References + +https://huggingface.co/nlphuji/mt_coref_en_de_gender \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_russian_coref_exp_en.md b/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_russian_coref_exp_en.md new file mode 100644 index 00000000000000..c5a78845b6e7f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_russian_coref_exp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English maltese_coref_english_russian_coref_exp MarianTransformer from nlphuji +author: John Snow Labs +name: maltese_coref_english_russian_coref_exp +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_coref_english_russian_coref_exp` is a English model originally trained by nlphuji. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_coref_english_russian_coref_exp_en_5.5.0_3.0_1726465248404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_coref_english_russian_coref_exp_en_5.5.0_3.0_1726465248404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("maltese_coref_english_russian_coref_exp","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("maltese_coref_english_russian_coref_exp","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_coref_english_russian_coref_exp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|525.9 MB| + +## References + +https://huggingface.co/nlphuji/mt_coref_en_ru_coref_exp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_spanish_gender_exp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_spanish_gender_exp_pipeline_en.md new file mode 100644 index 00000000000000..f7ec9022353b99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_spanish_gender_exp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English maltese_coref_english_spanish_gender_exp_pipeline pipeline MarianTransformer from nlphuji +author: John Snow Labs +name: maltese_coref_english_spanish_gender_exp_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_coref_english_spanish_gender_exp_pipeline` is a English model originally trained by nlphuji. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_coref_english_spanish_gender_exp_pipeline_en_5.5.0_3.0_1726457319759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_coref_english_spanish_gender_exp_pipeline_en_5.5.0_3.0_1726457319759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("maltese_coref_english_spanish_gender_exp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("maltese_coref_english_spanish_gender_exp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_coref_english_spanish_gender_exp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|540.9 MB| + +## References + +https://huggingface.co/nlphuji/mt_coref_en_es_gender_exp + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marathi_sentiment_subtitles_mr.md b/docs/_posts/ahmedlone127/2024-09-16-marathi_sentiment_subtitles_mr.md new file mode 100644 index 00000000000000..8cecec49224cd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marathi_sentiment_subtitles_mr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Marathi marathi_sentiment_subtitles BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: marathi_sentiment_subtitles +date: 2024-09-16 +tags: [mr, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_sentiment_subtitles` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_sentiment_subtitles_mr_5.5.0_3.0_1726498883827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_sentiment_subtitles_mr_5.5.0_3.0_1726498883827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("marathi_sentiment_subtitles","mr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("marathi_sentiment_subtitles", "mr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_sentiment_subtitles| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|mr| +|Size:|892.8 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-sentiment-subtitles \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marathi_sentiment_subtitles_pipeline_mr.md b/docs/_posts/ahmedlone127/2024-09-16-marathi_sentiment_subtitles_pipeline_mr.md new file mode 100644 index 00000000000000..262df54e77bae9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marathi_sentiment_subtitles_pipeline_mr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Marathi marathi_sentiment_subtitles_pipeline pipeline BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: marathi_sentiment_subtitles_pipeline +date: 2024-09-16 +tags: [mr, open_source, pipeline, onnx] +task: Text Classification +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_sentiment_subtitles_pipeline` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_sentiment_subtitles_pipeline_mr_5.5.0_3.0_1726498927043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_sentiment_subtitles_pipeline_mr_5.5.0_3.0_1726498927043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marathi_sentiment_subtitles_pipeline", lang = "mr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marathi_sentiment_subtitles_pipeline", lang = "mr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_sentiment_subtitles_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mr| +|Size:|892.9 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-sentiment-subtitles + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_spanish_pipeline_en.md new file mode 100644 index 00000000000000..6aad13d5264058 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_spanish_pipeline pipeline MarianTransformer from weirdMoonFace +author: John Snow Labs +name: marian_finetuned_kde4_english_spanish_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_spanish_pipeline` is a English model originally trained by weirdMoonFace. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_spanish_pipeline_en_5.5.0_3.0_1726491538381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_spanish_pipeline_en_5.5.0_3.0_1726491538381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|540.5 MB| + +## References + +https://huggingface.co/weirdMoonFace/marian-finetuned-kde4-en-es + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_en.md new file mode 100644 index 00000000000000..b589a101a3f0e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama MarianTransformer from tkoyama +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama` is a English model originally trained by tkoyama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_en_5.5.0_3.0_1726494292610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_en_5.5.0_3.0_1726494292610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.2 MB| + +## References + +https://huggingface.co/tkoyama/marian-finetuned-kde4-en-to-fr-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_en.md new file mode 100644 index 00000000000000..42c97a1c802f58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow MarianTransformer from vasaicrow +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow` is a English model originally trained by vasaicrow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_en_5.5.0_3.0_1726456953389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_en_5.5.0_3.0_1726456953389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|507.7 MB| + +## References + +https://huggingface.co/vasaicrow/marian-finetuned-kde4-en-to-fr-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_en.md new file mode 100644 index 00000000000000..8235ebca52bb26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd MarianTransformer from alfsnd +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd` is a English model originally trained by alfsnd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_en_5.5.0_3.0_1726493815494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_en_5.5.0_3.0_1726493815494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/alfsnd/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_pipeline_en.md new file mode 100644 index 00000000000000..a2c6de5df95831 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_pipeline pipeline MarianTransformer from alfsnd +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_pipeline` is a English model originally trained by alfsnd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_pipeline_en_5.5.0_3.0_1726493840418.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_pipeline_en_5.5.0_3.0_1726493840418.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_alfsnd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/alfsnd/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_chandlerzhang001_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_chandlerzhang001_en.md new file mode 100644 index 00000000000000..d2598ae7383753 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_chandlerzhang001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_chandlerzhang001 MarianTransformer from chandlerzhang001 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_chandlerzhang001 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_chandlerzhang001` is a English model originally trained by chandlerzhang001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_chandlerzhang001_en_5.5.0_3.0_1726503282391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_chandlerzhang001_en_5.5.0_3.0_1726503282391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_chandlerzhang001","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_chandlerzhang001","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_chandlerzhang001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/chandlerzhang001/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_gnjs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_gnjs_pipeline_en.md new file mode 100644 index 00000000000000..fe8ada3b2b23d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_gnjs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_gnjs_pipeline pipeline MarianTransformer from Gnjs +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_gnjs_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_gnjs_pipeline` is a English model originally trained by Gnjs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_gnjs_pipeline_en_5.5.0_3.0_1726490850261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_gnjs_pipeline_en_5.5.0_3.0_1726490850261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_gnjs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_gnjs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_gnjs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.7 MB| + +## References + +https://huggingface.co/Gnjs/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_en.md new file mode 100644 index 00000000000000..53ab3af8fbc514 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw MarianTransformer from heyw +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw` is a English model originally trained by heyw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_en_5.5.0_3.0_1726503092251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_en_5.5.0_3.0_1726503092251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/heyw/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_pipeline_en.md new file mode 100644 index 00000000000000..3dc4639c961e29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_pipeline pipeline MarianTransformer from heyw +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_pipeline` is a English model originally trained by heyw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_pipeline_en_5.5.0_3.0_1726503116762.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_pipeline_en_5.5.0_3.0_1726503116762.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_heyw_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.0 MB| + +## References + +https://huggingface.co/heyw/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_en.md new file mode 100644 index 00000000000000..9132665ea687c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally MarianTransformer from ryannhally +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally` is a English model originally trained by ryannhally. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_en_5.5.0_3.0_1726491296540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_en_5.5.0_3.0_1726491296540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/ryannhally/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_pipeline_en.md new file mode 100644 index 00000000000000..a80375090f4ff7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_pipeline pipeline MarianTransformer from ryannhally +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_pipeline` is a English model originally trained by ryannhally. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_pipeline_en_5.5.0_3.0_1726491322107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_pipeline_en_5.5.0_3.0_1726491322107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_ryannhally_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/ryannhally/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_en.md new file mode 100644 index 00000000000000..f94cbc1c07a292 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422 MarianTransformer from uo282422 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422` is a English model originally trained by uo282422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_en_5.5.0_3.0_1726491470940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_en_5.5.0_3.0_1726491470940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/uo282422/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_pipeline_en.md new file mode 100644 index 00000000000000..7ca7c67fcf4b79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_pipeline pipeline MarianTransformer from uo282422 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_pipeline` is a English model originally trained by uo282422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_pipeline_en_5.5.0_3.0_1726491496624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_pipeline_en_5.5.0_3.0_1726491496624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_uo282422_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.7 MB| + +## References + +https://huggingface.co/uo282422/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_japanese_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_japanese_en.md new file mode 100644 index 00000000000000..760443710c89ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_japanese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_japanese MarianTransformer from amacbee +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_japanese +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_japanese` is a English model originally trained by amacbee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_japanese_en_5.5.0_3.0_1726490999949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_japanese_en_5.5.0_3.0_1726490999949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_japanese","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_japanese","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_japanese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|430.3 MB| + +## References + +https://huggingface.co/amacbee/marian-finetuned-kde4-en-to-ja \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_maltese_nigerian_pidgin_english_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_maltese_nigerian_pidgin_english_en.md new file mode 100644 index 00000000000000..a03cbee472c41a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_maltese_nigerian_pidgin_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_maltese_nigerian_pidgin_english MarianTransformer from Enutrof +author: John Snow Labs +name: marian_maltese_nigerian_pidgin_english +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_maltese_nigerian_pidgin_english` is a English model originally trained by Enutrof. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_maltese_nigerian_pidgin_english_en_5.5.0_3.0_1726493761691.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_maltese_nigerian_pidgin_english_en_5.5.0_3.0_1726493761691.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_maltese_nigerian_pidgin_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_maltese_nigerian_pidgin_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_maltese_nigerian_pidgin_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|498.7 MB| + +## References + +https://huggingface.co/Enutrof/marian-mt-pcm-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_maltese_rom_romanian_ro.md b/docs/_posts/ahmedlone127/2024-09-16-marian_maltese_rom_romanian_ro.md new file mode 100644 index 00000000000000..8e18a482c8bf8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_maltese_rom_romanian_ro.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Moldavian, Moldovan, Romanian marian_maltese_rom_romanian MarianTransformer from IoanRazvan +author: John Snow Labs +name: marian_maltese_rom_romanian +date: 2024-09-16 +tags: [ro, open_source, onnx, translation, marian] +task: Translation +language: ro +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_maltese_rom_romanian` is a Moldavian, Moldovan, Romanian model originally trained by IoanRazvan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_maltese_rom_romanian_ro_5.5.0_3.0_1726503027770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_maltese_rom_romanian_ro_5.5.0_3.0_1726503027770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_maltese_rom_romanian","ro") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_maltese_rom_romanian","ro") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_maltese_rom_romanian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|ro| +|Size:|508.5 MB| + +## References + +https://huggingface.co/IoanRazvan/marian_mt_rom_ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_prelim_emea_enfr_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_prelim_emea_enfr_en.md new file mode 100644 index 00000000000000..6edd1983ea5a2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_prelim_emea_enfr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_prelim_emea_enfr MarianTransformer from ethansimrm +author: John Snow Labs +name: marian_prelim_emea_enfr +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_prelim_emea_enfr` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_prelim_emea_enfr_en_5.5.0_3.0_1726494311713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_prelim_emea_enfr_en_5.5.0_3.0_1726494311713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_prelim_emea_enfr","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_prelim_emea_enfr","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_prelim_emea_enfr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.2 MB| + +## References + +https://huggingface.co/ethansimrm/marian_prelim_emea_enfr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_prelim_emea_enfr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_prelim_emea_enfr_pipeline_en.md new file mode 100644 index 00000000000000..3ec6eb1a8fb023 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_prelim_emea_enfr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_prelim_emea_enfr_pipeline pipeline MarianTransformer from ethansimrm +author: John Snow Labs +name: marian_prelim_emea_enfr_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_prelim_emea_enfr_pipeline` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_prelim_emea_enfr_pipeline_en_5.5.0_3.0_1726494336506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_prelim_emea_enfr_pipeline_en_5.5.0_3.0_1726494336506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_prelim_emea_enfr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_prelim_emea_enfr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_prelim_emea_enfr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.8 MB| + +## References + +https://huggingface.co/ethansimrm/marian_prelim_emea_enfr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-medical_english_chinese_8_18_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-medical_english_chinese_8_18_pipeline_en.md new file mode 100644 index 00000000000000..bf43677e160609 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-medical_english_chinese_8_18_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English medical_english_chinese_8_18_pipeline pipeline MarianTransformer from DogGoesBark +author: John Snow Labs +name: medical_english_chinese_8_18_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medical_english_chinese_8_18_pipeline` is a English model originally trained by DogGoesBark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medical_english_chinese_8_18_pipeline_en_5.5.0_3.0_1726491413026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medical_english_chinese_8_18_pipeline_en_5.5.0_3.0_1726491413026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("medical_english_chinese_8_18_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("medical_english_chinese_8_18_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medical_english_chinese_8_18_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|541.8 MB| + +## References + +https://huggingface.co/DogGoesBark/medical_en_zh_8_18 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-mental_bert_base_uncased_finetuned_depression_en.md b/docs/_posts/ahmedlone127/2024-09-16-mental_bert_base_uncased_finetuned_depression_en.md new file mode 100644 index 00000000000000..d2f57659fd49b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-mental_bert_base_uncased_finetuned_depression_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mental_bert_base_uncased_finetuned_depression BertForSequenceClassification from welsachy +author: John Snow Labs +name: mental_bert_base_uncased_finetuned_depression +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mental_bert_base_uncased_finetuned_depression` is a English model originally trained by welsachy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mental_bert_base_uncased_finetuned_depression_en_5.5.0_3.0_1726492646773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mental_bert_base_uncased_finetuned_depression_en_5.5.0_3.0_1726492646773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("mental_bert_base_uncased_finetuned_depression","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("mental_bert_base_uncased_finetuned_depression", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mental_bert_base_uncased_finetuned_depression| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.8 MB| + +## References + +https://huggingface.co/welsachy/mental-bert-base-uncased-finetuned-depression \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-mental_bert_base_uncased_finetuned_depression_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-mental_bert_base_uncased_finetuned_depression_pipeline_en.md new file mode 100644 index 00000000000000..8c33f38161399c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-mental_bert_base_uncased_finetuned_depression_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mental_bert_base_uncased_finetuned_depression_pipeline pipeline BertForSequenceClassification from welsachy +author: John Snow Labs +name: mental_bert_base_uncased_finetuned_depression_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mental_bert_base_uncased_finetuned_depression_pipeline` is a English model originally trained by welsachy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mental_bert_base_uncased_finetuned_depression_pipeline_en_5.5.0_3.0_1726492666807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mental_bert_base_uncased_finetuned_depression_pipeline_en_5.5.0_3.0_1726492666807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mental_bert_base_uncased_finetuned_depression_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mental_bert_base_uncased_finetuned_depression_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mental_bert_base_uncased_finetuned_depression_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.9 MB| + +## References + +https://huggingface.co/welsachy/mental-bert-base-uncased-finetuned-depression + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-mix2_japanese_english_helsinki_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-mix2_japanese_english_helsinki_pipeline_en.md new file mode 100644 index 00000000000000..52e0c3469bdf80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-mix2_japanese_english_helsinki_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mix2_japanese_english_helsinki_pipeline pipeline MarianTransformer from twieland +author: John Snow Labs +name: mix2_japanese_english_helsinki_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mix2_japanese_english_helsinki_pipeline` is a English model originally trained by twieland. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mix2_japanese_english_helsinki_pipeline_en_5.5.0_3.0_1726465224112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mix2_japanese_english_helsinki_pipeline_en_5.5.0_3.0_1726465224112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mix2_japanese_english_helsinki_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mix2_japanese_english_helsinki_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mix2_japanese_english_helsinki_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|518.2 MB| + +## References + +https://huggingface.co/twieland/MIX2_ja-en_helsinki + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-multiple_languages_coptic_english_norm_group_greekified_en.md b/docs/_posts/ahmedlone127/2024-09-16-multiple_languages_coptic_english_norm_group_greekified_en.md new file mode 100644 index 00000000000000..dc6b280f885051 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-multiple_languages_coptic_english_norm_group_greekified_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English multiple_languages_coptic_english_norm_group_greekified MarianTransformer from megalaa +author: John Snow Labs +name: multiple_languages_coptic_english_norm_group_greekified +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multiple_languages_coptic_english_norm_group_greekified` is a English model originally trained by megalaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multiple_languages_coptic_english_norm_group_greekified_en_5.5.0_3.0_1726503214871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multiple_languages_coptic_english_norm_group_greekified_en_5.5.0_3.0_1726503214871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("multiple_languages_coptic_english_norm_group_greekified","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("multiple_languages_coptic_english_norm_group_greekified","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multiple_languages_coptic_english_norm_group_greekified| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|532.6 MB| + +## References + +https://huggingface.co/megalaa/mul-cop-en-norm-group-greekified \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-multiple_languages_coptic_english_norm_group_greekified_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-multiple_languages_coptic_english_norm_group_greekified_pipeline_en.md new file mode 100644 index 00000000000000..387ba27518f433 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-multiple_languages_coptic_english_norm_group_greekified_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English multiple_languages_coptic_english_norm_group_greekified_pipeline pipeline MarianTransformer from megalaa +author: John Snow Labs +name: multiple_languages_coptic_english_norm_group_greekified_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multiple_languages_coptic_english_norm_group_greekified_pipeline` is a English model originally trained by megalaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multiple_languages_coptic_english_norm_group_greekified_pipeline_en_5.5.0_3.0_1726503240630.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multiple_languages_coptic_english_norm_group_greekified_pipeline_en_5.5.0_3.0_1726503240630.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multiple_languages_coptic_english_norm_group_greekified_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multiple_languages_coptic_english_norm_group_greekified_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multiple_languages_coptic_english_norm_group_greekified_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|533.1 MB| + +## References + +https://huggingface.co/megalaa/mul-cop-en-norm-group-greekified + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-n_distilbert_twitterfin_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-16-n_distilbert_twitterfin_padding20model_en.md new file mode 100644 index 00000000000000..f5fd94759901e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-n_distilbert_twitterfin_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_twitterfin_padding20model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_twitterfin_padding20model +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_twitterfin_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding20model_en_5.5.0_3.0_1726506212047.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding20model_en_5.5.0_3.0_1726506212047.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_twitterfin_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_twitterfin_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_twitterfin_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_twitterfin_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-n_distilbert_twitterfin_padding20model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-n_distilbert_twitterfin_padding20model_pipeline_en.md new file mode 100644 index 00000000000000..82fa3e4d6c34cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-n_distilbert_twitterfin_padding20model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_twitterfin_padding20model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_twitterfin_padding20model_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_twitterfin_padding20model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding20model_pipeline_en_5.5.0_3.0_1726506223464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding20model_pipeline_en_5.5.0_3.0_1726506223464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_twitterfin_padding20model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_twitterfin_padding20model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_twitterfin_padding20model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_twitterfin_padding20model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-nerd_nerd_random0_seed0_twitter_roberta_base_dec2020_en.md b/docs/_posts/ahmedlone127/2024-09-16-nerd_nerd_random0_seed0_twitter_roberta_base_dec2020_en.md new file mode 100644 index 00000000000000..f840f9883ecd57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-nerd_nerd_random0_seed0_twitter_roberta_base_dec2020_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random0_seed0_twitter_roberta_base_dec2020 RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random0_seed0_twitter_roberta_base_dec2020 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random0_seed0_twitter_roberta_base_dec2020` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed0_twitter_roberta_base_dec2020_en_5.5.0_3.0_1726517972249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed0_twitter_roberta_base_dec2020_en_5.5.0_3.0_1726517972249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random0_seed0_twitter_roberta_base_dec2020","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random0_seed0_twitter_roberta_base_dec2020", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random0_seed0_twitter_roberta_base_dec2020| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random0_seed0-twitter-roberta-base-dec2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-nerubios_roberta_base_bne_training_development_en.md b/docs/_posts/ahmedlone127/2024-09-16-nerubios_roberta_base_bne_training_development_en.md new file mode 100644 index 00000000000000..68ed8b6697445f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-nerubios_roberta_base_bne_training_development_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerubios_roberta_base_bne_training_development RoBertaForTokenClassification from ajtamayoh +author: John Snow Labs +name: nerubios_roberta_base_bne_training_development +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerubios_roberta_base_bne_training_development` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerubios_roberta_base_bne_training_development_en_5.5.0_3.0_1726482867824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerubios_roberta_base_bne_training_development_en_5.5.0_3.0_1726482867824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("nerubios_roberta_base_bne_training_development","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("nerubios_roberta_base_bne_training_development", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerubios_roberta_base_bne_training_development| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|437.6 MB| + +## References + +https://huggingface.co/ajtamayoh/NeRUBioS_RoBERTa_base_bne_Training_Development \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-nerubios_roberta_base_bne_training_development_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-nerubios_roberta_base_bne_training_development_pipeline_en.md new file mode 100644 index 00000000000000..2d48922c5814eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-nerubios_roberta_base_bne_training_development_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerubios_roberta_base_bne_training_development_pipeline pipeline RoBertaForTokenClassification from ajtamayoh +author: John Snow Labs +name: nerubios_roberta_base_bne_training_development_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerubios_roberta_base_bne_training_development_pipeline` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerubios_roberta_base_bne_training_development_pipeline_en_5.5.0_3.0_1726482890469.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerubios_roberta_base_bne_training_development_pipeline_en_5.5.0_3.0_1726482890469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerubios_roberta_base_bne_training_development_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerubios_roberta_base_bne_training_development_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerubios_roberta_base_bne_training_development_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|437.6 MB| + +## References + +https://huggingface.co/ajtamayoh/NeRUBioS_RoBERTa_base_bne_Training_Development + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-newspapers_procon_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-16-newspapers_procon_spanish_en.md new file mode 100644 index 00000000000000..d1a8c7dc6cbf3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-newspapers_procon_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English newspapers_procon_spanish BertForSequenceClassification from taniwasl +author: John Snow Labs +name: newspapers_procon_spanish +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`newspapers_procon_spanish` is a English model originally trained by taniwasl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/newspapers_procon_spanish_en_5.5.0_3.0_1726499300434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/newspapers_procon_spanish_en_5.5.0_3.0_1726499300434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("newspapers_procon_spanish","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("newspapers_procon_spanish", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|newspapers_procon_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/taniwasl/newspapers_procon_es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-newspapers_procon_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-newspapers_procon_spanish_pipeline_en.md new file mode 100644 index 00000000000000..190184d8ae19df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-newspapers_procon_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English newspapers_procon_spanish_pipeline pipeline BertForSequenceClassification from taniwasl +author: John Snow Labs +name: newspapers_procon_spanish_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`newspapers_procon_spanish_pipeline` is a English model originally trained by taniwasl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/newspapers_procon_spanish_pipeline_en_5.5.0_3.0_1726499320299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/newspapers_procon_spanish_pipeline_en_5.5.0_3.0_1726499320299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("newspapers_procon_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("newspapers_procon_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|newspapers_procon_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/taniwasl/newspapers_procon_es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-nusabert_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-nusabert_large_pipeline_en.md new file mode 100644 index 00000000000000..4d35d487238b94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-nusabert_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nusabert_large_pipeline pipeline BertEmbeddings from LazarusNLP +author: John Snow Labs +name: nusabert_large_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nusabert_large_pipeline` is a English model originally trained by LazarusNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nusabert_large_pipeline_en_5.5.0_3.0_1726468483967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nusabert_large_pipeline_en_5.5.0_3.0_1726468483967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nusabert_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nusabert_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nusabert_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/LazarusNLP/NusaBERT-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_base_wce_random_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_base_wce_random_en.md new file mode 100644 index 00000000000000..c71d36c26aca3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_base_wce_random_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_base_wce_random MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_base_wce_random +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_base_wce_random` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_base_wce_random_en_5.5.0_3.0_1726503027985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_base_wce_random_en_5.5.0_3.0_1726503027985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_base_wce_random","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_base_wce_random","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_base_wce_random| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/ethansimrm/opus_base_wce_random \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_english_vietnamese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_english_vietnamese_pipeline_en.md new file mode 100644 index 00000000000000..86914fa4c8c5c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_english_vietnamese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_english_vietnamese_pipeline pipeline MarianTransformer from Kudod +author: John Snow Labs +name: opus_english_vietnamese_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_english_vietnamese_pipeline` is a English model originally trained by Kudod. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_english_vietnamese_pipeline_en_5.5.0_3.0_1726494122230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_english_vietnamese_pipeline_en_5.5.0_3.0_1726494122230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_english_vietnamese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_english_vietnamese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_english_vietnamese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|475.7 MB| + +## References + +https://huggingface.co/Kudod/opus-en-vi + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_en.md new file mode 100644 index 00000000000000..86570366629859 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng MarianTransformer from ketong3906 +author: John Snow Labs +name: opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng` is a English model originally trained by ketong3906. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_en_5.5.0_3.0_1726509829777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_en_5.5.0_3.0_1726509829777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|540.3 MB| + +## References + +https://huggingface.co/ketong3906/opus-mt-zh-en-finetuned-chn-to-eng \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_pipeline_en.md new file mode 100644 index 00000000000000..fcb5d7d59d837f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_pipeline pipeline MarianTransformer from ketong3906 +author: John Snow Labs +name: opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_pipeline` is a English model originally trained by ketong3906. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_pipeline_en_5.5.0_3.0_1726509854178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_pipeline_en_5.5.0_3.0_1726509854178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_chinese_english_finetuned_chn_tonga_tonga_islands_eng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|540.8 MB| + +## References + +https://huggingface.co/ketong3906/opus-mt-zh-en-finetuned-chn-to-eng + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_pipeline_en.md new file mode 100644 index 00000000000000..607fb6baac834d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_pipeline pipeline MarianTransformer from DevAibest +author: John Snow Labs +name: opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_pipeline` is a English model originally trained by DevAibest. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_pipeline_en_5.5.0_3.0_1726503457261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_pipeline_en_5.5.0_3.0_1726503457261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.9 MB| + +## References + +https://huggingface.co/DevAibest/opus-mt-en-fr-finetuned-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_pipeline_en.md new file mode 100644 index 00000000000000..5b7cafd4a8b03e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_pipeline pipeline MarianTransformer from Eyesiga +author: John Snow Labs +name: opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_pipeline` is a English model originally trained by Eyesiga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_pipeline_en_5.5.0_3.0_1726457577877.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_pipeline_en_5.5.0_3.0_1726457577877.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|514.9 MB| + +## References + +https://huggingface.co/Eyesiga/opus-mt-en-lg-finetuned-en-to-lg-finetuned-en-to-lm + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_en.md new file mode 100644 index 00000000000000..5a143c018a645e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl MarianTransformer from zannabethl +author: John Snow Labs +name: opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl` is a English model originally trained by zannabethl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_en_5.5.0_3.0_1726493963988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_en_5.5.0_3.0_1726493963988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|499.5 MB| + +## References + +https://huggingface.co/zannabethl/opus-mt-en-de-finetuned-en-to-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_pipeline_en.md new file mode 100644 index 00000000000000..c591081f257271 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_pipeline pipeline MarianTransformer from zannabethl +author: John Snow Labs +name: opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_pipeline` is a English model originally trained by zannabethl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_pipeline_en_5.5.0_3.0_1726493988234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_pipeline_en_5.5.0_3.0_1726493988234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_german_finetuned_english_tonga_tonga_islands_german_zannabethl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|500.0 MB| + +## References + +https://huggingface.co/zannabethl/opus-mt-en-de-finetuned-en-to-de + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_bishalbaaniya_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_bishalbaaniya_pipeline_en.md new file mode 100644 index 00000000000000..e0c0ad0a89d052 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_bishalbaaniya_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_bishalbaaniya_pipeline pipeline MarianTransformer from bishalbaaniya +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_bishalbaaniya_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_bishalbaaniya_pipeline` is a English model originally trained by bishalbaaniya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_bishalbaaniya_pipeline_en_5.5.0_3.0_1726457312109.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_bishalbaaniya_pipeline_en_5.5.0_3.0_1726457312109.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_bishalbaaniya_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_bishalbaaniya_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_bishalbaaniya_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.2 MB| + +## References + +https://huggingface.co/bishalbaaniya/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_diegoalysson_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_diegoalysson_en.md new file mode 100644 index 00000000000000..087e3279a4b157 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_diegoalysson_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_diegoalysson MarianTransformer from DiegoAlysson +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_diegoalysson +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_diegoalysson` is a English model originally trained by DiegoAlysson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_diegoalysson_en_5.5.0_3.0_1726491028767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_diegoalysson_en_5.5.0_3.0_1726491028767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_diegoalysson","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_diegoalysson","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_diegoalysson| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/DiegoAlysson/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_pipeline_en.md new file mode 100644 index 00000000000000..96b645a001c6d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_pipeline pipeline MarianTransformer from fxshan +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_pipeline` is a English model originally trained by fxshan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_pipeline_en_5.5.0_3.0_1726491604736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_pipeline_en_5.5.0_3.0_1726491604736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.1 MB| + +## References + +https://huggingface.co/fxshan/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shadman_rohan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shadman_rohan_pipeline_en.md new file mode 100644 index 00000000000000..60404e282fc811 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shadman_rohan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shadman_rohan_pipeline pipeline MarianTransformer from Shadman-Rohan +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shadman_rohan_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shadman_rohan_pipeline` is a English model originally trained by Shadman-Rohan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shadman_rohan_pipeline_en_5.5.0_3.0_1726493946198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shadman_rohan_pipeline_en_5.5.0_3.0_1726493946198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shadman_rohan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shadman_rohan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_shadman_rohan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.1 MB| + +## References + +https://huggingface.co/Shadman-Rohan/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_sy23_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_sy23_en.md new file mode 100644 index 00000000000000..1048b868061808 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_sy23_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_sy23 MarianTransformer from SY23 +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_sy23 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_sy23` is a English model originally trained by SY23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_sy23_en_5.5.0_3.0_1726457869457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_sy23_en_5.5.0_3.0_1726457869457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_sy23","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_sy23","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_sy23| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/SY23/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_pipeline_en.md new file mode 100644 index 00000000000000..5d186d5c58ddb0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_pipeline pipeline MarianTransformer from yeshanp +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_pipeline` is a English model originally trained by yeshanp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_pipeline_en_5.5.0_3.0_1726509955573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_pipeline_en_5.5.0_3.0_1726509955573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.1 MB| + +## References + +https://huggingface.co/yeshanp/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_en.md new file mode 100644 index 00000000000000..5165de69df03d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus MarianTransformer from PontifexMaximus +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus` is a English model originally trained by PontifexMaximus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_en_5.5.0_3.0_1726503446089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_en_5.5.0_3.0_1726503446089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|495.8 MB| + +## References + +https://huggingface.co/PontifexMaximus/opus-mt-en-ro-finetuned-ro-to-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_pipeline_en.md new file mode 100644 index 00000000000000..aaa1e997882194 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_pipeline pipeline MarianTransformer from PontifexMaximus +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_pipeline` is a English model originally trained by PontifexMaximus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_pipeline_en_5.5.0_3.0_1726503470797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_pipeline_en_5.5.0_3.0_1726503470797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_pontifexmaximus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|496.3 MB| + +## References + +https://huggingface.co/PontifexMaximus/opus-mt-en-ro-finetuned-ro-to-en + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_russian_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_russian_finetuned_en.md new file mode 100644 index 00000000000000..5b114bb16bb1eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_russian_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_russian_finetuned MarianTransformer from kazandaev +author: John Snow Labs +name: opus_maltese_english_russian_finetuned +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_russian_finetuned` is a English model originally trained by kazandaev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_russian_finetuned_en_5.5.0_3.0_1726503229391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_russian_finetuned_en_5.5.0_3.0_1726503229391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_russian_finetuned","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_russian_finetuned","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_russian_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|526.5 MB| + +## References + +https://huggingface.co/kazandaev/opus-mt-en-ru-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_russian_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_russian_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..ec122814c87c77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_russian_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_russian_finetuned_pipeline pipeline MarianTransformer from kazandaev +author: John Snow Labs +name: opus_maltese_english_russian_finetuned_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_russian_finetuned_pipeline` is a English model originally trained by kazandaev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_russian_finetuned_pipeline_en_5.5.0_3.0_1726503253744.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_russian_finetuned_pipeline_en_5.5.0_3.0_1726503253744.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_russian_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_russian_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_russian_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|527.1 MB| + +## References + +https://huggingface.co/kazandaev/opus-mt-en-ru-finetuned + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_azz_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_azz_en.md new file mode 100644 index 00000000000000..d86ba494d59d1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_azz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_azz MarianTransformer from mekjr1 +author: John Snow Labs +name: opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_azz +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_azz` is a English model originally trained by mekjr1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_azz_en_5.5.0_3.0_1726494372030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_azz_en_5.5.0_3.0_1726494372030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_azz","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_azz","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_azz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|539.9 MB| + +## References + +https://huggingface.co/mekjr1/opus-mt-en-es-finetuned-es-to-azz \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_en.md new file mode 100644 index 00000000000000..c08d21908f31ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol MarianTransformer from uriel +author: John Snow Labs +name: opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol` is a English model originally trained by uriel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_en_5.5.0_3.0_1726494415295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_en_5.5.0_3.0_1726494415295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|517.6 MB| + +## References + +https://huggingface.co/uriel/opus-mt-fr-de-new-finetuned-fr-to-wol \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_pipeline_en.md new file mode 100644 index 00000000000000..cfbb312d39d560 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_pipeline pipeline MarianTransformer from uriel +author: John Snow Labs +name: opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_pipeline` is a English model originally trained by uriel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_pipeline_en_5.5.0_3.0_1726494439926.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_pipeline_en_5.5.0_3.0_1726494439926.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_french_german_nepal_bhasa_finetuned_french_tonga_tonga_islands_wol_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|518.1 MB| + +## References + +https://huggingface.co/uriel/opus-mt-fr-de-new-finetuned-fr-to-wol + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_en.md new file mode 100644 index 00000000000000..ebaac55f98d591 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_indonesian_english_ccmatrix_warmup_best_loss MarianTransformer from yonathanstwn +author: John Snow Labs +name: opus_maltese_indonesian_english_ccmatrix_warmup_best_loss +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_indonesian_english_ccmatrix_warmup_best_loss` is a English model originally trained by yonathanstwn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_en_5.5.0_3.0_1726509742084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_en_5.5.0_3.0_1726509742084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_indonesian_english_ccmatrix_warmup_best_loss","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_indonesian_english_ccmatrix_warmup_best_loss","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_indonesian_english_ccmatrix_warmup_best_loss| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|480.4 MB| + +## References + +https://huggingface.co/yonathanstwn/opus-mt-id-en-ccmatrix-warmup-best-loss \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_pipeline_en.md new file mode 100644 index 00000000000000..a60bdf05c07560 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_pipeline pipeline MarianTransformer from yonathanstwn +author: John Snow Labs +name: opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_pipeline` is a English model originally trained by yonathanstwn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_pipeline_en_5.5.0_3.0_1726509767633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_pipeline_en_5.5.0_3.0_1726509767633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_indonesian_english_ccmatrix_warmup_best_loss_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|480.9 MB| + +## References + +https://huggingface.co/yonathanstwn/opus-mt-id-en-ccmatrix-warmup-best-loss + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_en.md new file mode 100644 index 00000000000000..6c3d758ef6ab29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4 MarianTransformer from astrojihye +author: John Snow Labs +name: opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4` is a English model originally trained by astrojihye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_en_5.5.0_3.0_1726503381528.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_en_5.5.0_3.0_1726503381528.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|540.6 MB| + +## References + +https://huggingface.co/astrojihye/opus-mt-ko-en-finetuned-ko-to-en4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_pipeline_en.md new file mode 100644 index 00000000000000..402100062aaf03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_pipeline pipeline MarianTransformer from astrojihye +author: John Snow Labs +name: opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_pipeline` is a English model originally trained by astrojihye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_pipeline_en_5.5.0_3.0_1726503407075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_pipeline_en_5.5.0_3.0_1726503407075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|541.2 MB| + +## References + +https://huggingface.co/astrojihye/opus-mt-ko-en-finetuned-ko-to-en4 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_romanian_french_finetuned_romanian_tonga_tonga_islands_rup_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_romanian_french_finetuned_romanian_tonga_tonga_islands_rup_en.md new file mode 100644 index 00000000000000..b4c74477eb1e09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_romanian_french_finetuned_romanian_tonga_tonga_islands_rup_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_romanian_french_finetuned_romanian_tonga_tonga_islands_rup MarianTransformer from snisioi +author: John Snow Labs +name: opus_maltese_romanian_french_finetuned_romanian_tonga_tonga_islands_rup +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_romanian_french_finetuned_romanian_tonga_tonga_islands_rup` is a English model originally trained by snisioi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_romanian_french_finetuned_romanian_tonga_tonga_islands_rup_en_5.5.0_3.0_1726491652351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_romanian_french_finetuned_romanian_tonga_tonga_islands_rup_en_5.5.0_3.0_1726491652351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_romanian_french_finetuned_romanian_tonga_tonga_islands_rup","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_romanian_french_finetuned_romanian_tonga_tonga_islands_rup","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_romanian_french_finetuned_romanian_tonga_tonga_islands_rup| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|492.1 MB| + +## References + +https://huggingface.co/snisioi/opus-mt-ro-fr-finetuned-ro-to-rup \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_pipeline_en.md new file mode 100644 index 00000000000000..5ee35e06f33760 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_pipeline pipeline MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_pipeline` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_pipeline_en_5.5.0_3.0_1726491207254.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_pipeline_en_5.5.0_3.0_1726491207254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|519.2 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-sem-en-finetuned-npomo-en-15-epochs + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_slavic_languages_english_finetuned_ukrainian_tonga_tonga_islands_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_slavic_languages_english_finetuned_ukrainian_tonga_tonga_islands_english_pipeline_en.md new file mode 100644 index 00000000000000..f8218905415656 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_slavic_languages_english_finetuned_ukrainian_tonga_tonga_islands_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_slavic_languages_english_finetuned_ukrainian_tonga_tonga_islands_english_pipeline pipeline MarianTransformer from stopdoingmath +author: John Snow Labs +name: opus_maltese_slavic_languages_english_finetuned_ukrainian_tonga_tonga_islands_english_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_slavic_languages_english_finetuned_ukrainian_tonga_tonga_islands_english_pipeline` is a English model originally trained by stopdoingmath. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_slavic_languages_english_finetuned_ukrainian_tonga_tonga_islands_english_pipeline_en_5.5.0_3.0_1726490848525.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_slavic_languages_english_finetuned_ukrainian_tonga_tonga_islands_english_pipeline_en_5.5.0_3.0_1726490848525.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_slavic_languages_english_finetuned_ukrainian_tonga_tonga_islands_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_slavic_languages_english_finetuned_ukrainian_tonga_tonga_islands_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_slavic_languages_english_finetuned_ukrainian_tonga_tonga_islands_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|506.0 MB| + +## References + +https://huggingface.co/stopdoingmath/opus-mt-sla-en-finetuned-uk-to-en + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_en.md new file mode 100644 index 00000000000000..562ac7bfc80c41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3 MarianTransformer from mohamedtolba +author: John Snow Labs +name: opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3` is a English model originally trained by mohamedtolba. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_en_5.5.0_3.0_1726494291975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_en_5.5.0_3.0_1726494291975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/mohamedtolba/opus-mt-tc-big-en-ar-finetuned-franco-to-arabic-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_pipeline_en.md new file mode 100644 index 00000000000000..6c8f5ff836c036 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_pipeline pipeline MarianTransformer from mohamedtolba +author: John Snow Labs +name: opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_pipeline` is a English model originally trained by mohamedtolba. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_pipeline_en_5.5.0_3.0_1726494366047.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_pipeline_en_5.5.0_3.0_1726494366047.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_tc_big_english_arabic_finetuned_franco_tonga_tonga_islands_arabic_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/mohamedtolba/opus-mt-tc-big-en-ar-finetuned-franco-to-arabic-3 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_en.md new file mode 100644 index 00000000000000..c75544c7d3ec7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french MarianTransformer from DevAibest +author: John Snow Labs +name: opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french` is a English model originally trained by DevAibest. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_en_5.5.0_3.0_1726494079090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_en_5.5.0_3.0_1726494079090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DevAibest/opus-mt-tc-big-finetuned-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_wmt_finetuned_enfr_ailem_2021_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_wmt_finetuned_enfr_ailem_2021_pipeline_en.md new file mode 100644 index 00000000000000..877db558f2c56d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_wmt_finetuned_enfr_ailem_2021_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_wmt_finetuned_enfr_ailem_2021_pipeline pipeline MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_wmt_finetuned_enfr_ailem_2021_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_wmt_finetuned_enfr_ailem_2021_pipeline` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_wmt_finetuned_enfr_ailem_2021_pipeline_en_5.5.0_3.0_1726490850500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_wmt_finetuned_enfr_ailem_2021_pipeline_en_5.5.0_3.0_1726490850500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_wmt_finetuned_enfr_ailem_2021_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_wmt_finetuned_enfr_ailem_2021_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_wmt_finetuned_enfr_ailem_2021_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.0 MB| + +## References + +https://huggingface.co/ethansimrm/opus_wmt_finetuned_enfr_ailem_2021 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-parapred_english_french_en.md b/docs/_posts/ahmedlone127/2024-09-16-parapred_english_french_en.md new file mode 100644 index 00000000000000..b446acb8839767 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-parapred_english_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English parapred_english_french MarianTransformer from Ghani-25 +author: John Snow Labs +name: parapred_english_french +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`parapred_english_french` is a English model originally trained by Ghani-25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/parapred_english_french_en_5.5.0_3.0_1726494125071.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/parapred_english_french_en_5.5.0_3.0_1726494125071.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("parapred_english_french","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("parapred_english_french","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|parapred_english_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/Ghani-25/parapred_en_fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-parsingmodel_en.md b/docs/_posts/ahmedlone127/2024-09-16-parsingmodel_en.md new file mode 100644 index 00000000000000..b185fa55a4b033 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-parsingmodel_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English parsingmodel RoBertaForQuestionAnswering from satya7064 +author: John Snow Labs +name: parsingmodel +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`parsingmodel` is a English model originally trained by satya7064. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/parsingmodel_en_5.5.0_3.0_1726460338808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/parsingmodel_en_5.5.0_3.0_1726460338808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("parsingmodel","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("parsingmodel", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|parsingmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.8 MB| + +## References + +https://huggingface.co/satya7064/parsingmodel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-passage_ranker_v1_xs_multilingual_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-16-passage_ranker_v1_xs_multilingual_pipeline_xx.md new file mode 100644 index 00000000000000..cf533a837a2e3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-passage_ranker_v1_xs_multilingual_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual passage_ranker_v1_xs_multilingual_pipeline pipeline BertForSequenceClassification from sinequa +author: John Snow Labs +name: passage_ranker_v1_xs_multilingual_pipeline +date: 2024-09-16 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`passage_ranker_v1_xs_multilingual_pipeline` is a Multilingual model originally trained by sinequa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/passage_ranker_v1_xs_multilingual_pipeline_xx_5.5.0_3.0_1726492793169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/passage_ranker_v1_xs_multilingual_pipeline_xx_5.5.0_3.0_1726492793169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("passage_ranker_v1_xs_multilingual_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("passage_ranker_v1_xs_multilingual_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|passage_ranker_v1_xs_multilingual_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|61.1 MB| + +## References + +https://huggingface.co/sinequa/passage-ranker-v1-XS-multilingual + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-passage_ranker_v1_xs_multilingual_xx.md b/docs/_posts/ahmedlone127/2024-09-16-passage_ranker_v1_xs_multilingual_xx.md new file mode 100644 index 00000000000000..2425ef33f75ba3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-passage_ranker_v1_xs_multilingual_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual passage_ranker_v1_xs_multilingual BertForSequenceClassification from sinequa +author: John Snow Labs +name: passage_ranker_v1_xs_multilingual +date: 2024-09-16 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`passage_ranker_v1_xs_multilingual` is a Multilingual model originally trained by sinequa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/passage_ranker_v1_xs_multilingual_xx_5.5.0_3.0_1726492789595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/passage_ranker_v1_xs_multilingual_xx_5.5.0_3.0_1726492789595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("passage_ranker_v1_xs_multilingual","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("passage_ranker_v1_xs_multilingual", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|passage_ranker_v1_xs_multilingual| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|61.0 MB| + +## References + +https://huggingface.co/sinequa/passage-ranker-v1-XS-multilingual \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-phrasebank_fine_tuned_twitter_base_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-16-phrasebank_fine_tuned_twitter_base_roberta_en.md new file mode 100644 index 00000000000000..66ae0cf966a447 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-phrasebank_fine_tuned_twitter_base_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_fine_tuned_twitter_base_roberta RoBertaForSequenceClassification from steve1989 +author: John Snow Labs +name: phrasebank_fine_tuned_twitter_base_roberta +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_fine_tuned_twitter_base_roberta` is a English model originally trained by steve1989. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_fine_tuned_twitter_base_roberta_en_5.5.0_3.0_1726518105068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_fine_tuned_twitter_base_roberta_en_5.5.0_3.0_1726518105068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("phrasebank_fine_tuned_twitter_base_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("phrasebank_fine_tuned_twitter_base_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_fine_tuned_twitter_base_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/steve1989/phrasebank-fine-tuned-twitter-base-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-projectta2_en.md b/docs/_posts/ahmedlone127/2024-09-16-projectta2_en.md new file mode 100644 index 00000000000000..f32989d0b96bb0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-projectta2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English projectta2 MarianTransformer from Bekicot +author: John Snow Labs +name: projectta2 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`projectta2` is a English model originally trained by Bekicot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/projectta2_en_5.5.0_3.0_1726509954120.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/projectta2_en_5.5.0_3.0_1726509954120.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("projectta2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("projectta2","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|projectta2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|479.8 MB| + +## References + +https://huggingface.co/Bekicot/ProjectTa2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-qa_model_shudhanshu009_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-qa_model_shudhanshu009_pipeline_en.md new file mode 100644 index 00000000000000..8d4616d4554715 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-qa_model_shudhanshu009_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English qa_model_shudhanshu009_pipeline pipeline DistilBertForQuestionAnswering from shudhanshu009 +author: John Snow Labs +name: qa_model_shudhanshu009_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_model_shudhanshu009_pipeline` is a English model originally trained by shudhanshu009. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_model_shudhanshu009_pipeline_en_5.5.0_3.0_1726515060673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_model_shudhanshu009_pipeline_en_5.5.0_3.0_1726515060673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qa_model_shudhanshu009_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qa_model_shudhanshu009_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_model_shudhanshu009_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/shudhanshu009/qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-qa_squad_roberta_based_en.md b/docs/_posts/ahmedlone127/2024-09-16-qa_squad_roberta_based_en.md new file mode 100644 index 00000000000000..cb944ec036977b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-qa_squad_roberta_based_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English qa_squad_roberta_based RoBertaForQuestionAnswering from gasmartin +author: John Snow Labs +name: qa_squad_roberta_based +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_squad_roberta_based` is a English model originally trained by gasmartin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_squad_roberta_based_en_5.5.0_3.0_1726460345266.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_squad_roberta_based_en_5.5.0_3.0_1726460345266.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("qa_squad_roberta_based","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("qa_squad_roberta_based", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_squad_roberta_based| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|447.1 MB| + +## References + +https://huggingface.co/gasmartin/qa-squad-roberta-based \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-question_answering_finetuned_elisa_en.md b/docs/_posts/ahmedlone127/2024-09-16-question_answering_finetuned_elisa_en.md new file mode 100644 index 00000000000000..e8e3c6b58abeb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-question_answering_finetuned_elisa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English question_answering_finetuned_elisa DistilBertForQuestionAnswering from Elisa +author: John Snow Labs +name: question_answering_finetuned_elisa +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_answering_finetuned_elisa` is a English model originally trained by Elisa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_answering_finetuned_elisa_en_5.5.0_3.0_1726469336232.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_answering_finetuned_elisa_en_5.5.0_3.0_1726469336232.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("question_answering_finetuned_elisa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("question_answering_finetuned_elisa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_answering_finetuned_elisa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Elisa/question_answering_finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-question_answering_finetuned_elisa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-question_answering_finetuned_elisa_pipeline_en.md new file mode 100644 index 00000000000000..1c58354e0ac1c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-question_answering_finetuned_elisa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English question_answering_finetuned_elisa_pipeline pipeline DistilBertForQuestionAnswering from Elisa +author: John Snow Labs +name: question_answering_finetuned_elisa_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_answering_finetuned_elisa_pipeline` is a English model originally trained by Elisa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_answering_finetuned_elisa_pipeline_en_5.5.0_3.0_1726469348100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_answering_finetuned_elisa_pipeline_en_5.5.0_3.0_1726469348100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("question_answering_finetuned_elisa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("question_answering_finetuned_elisa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_answering_finetuned_elisa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Elisa/question_answering_finetuned + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-replica_classification_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-16-replica_classification_pipeline_ru.md new file mode 100644 index 00000000000000..0d4070e488b7bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-replica_classification_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian replica_classification_pipeline pipeline XlmRoBertaForSequenceClassification from TeraSpace +author: John Snow Labs +name: replica_classification_pipeline +date: 2024-09-16 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`replica_classification_pipeline` is a Russian model originally trained by TeraSpace. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/replica_classification_pipeline_ru_5.5.0_3.0_1726516898459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/replica_classification_pipeline_ru_5.5.0_3.0_1726516898459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("replica_classification_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("replica_classification_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|replica_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|524.5 MB| + +## References + +https://huggingface.co/TeraSpace/replica_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-replica_classification_ru.md b/docs/_posts/ahmedlone127/2024-09-16-replica_classification_ru.md new file mode 100644 index 00000000000000..0ad5538cd0a3e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-replica_classification_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian replica_classification XlmRoBertaForSequenceClassification from TeraSpace +author: John Snow Labs +name: replica_classification +date: 2024-09-16 +tags: [ru, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`replica_classification` is a Russian model originally trained by TeraSpace. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/replica_classification_ru_5.5.0_3.0_1726516726879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/replica_classification_ru_5.5.0_3.0_1726516726879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("replica_classification","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("replica_classification", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|replica_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ru| +|Size:|524.5 MB| + +## References + +https://huggingface.co/TeraSpace/replica_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-results_en.md b/docs/_posts/ahmedlone127/2024-09-16-results_en.md new file mode 100644 index 00000000000000..02e44bbf8cefc0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-results_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English results DistilBertForQuestionAnswering from Souvik123 +author: John Snow Labs +name: results +date: 2024-09-16 +tags: [distilbert, en, open_source, question_answering, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results` is a English model originally trained by Souvik123. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_en_5.5.0_3.0_1726529647683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_en_5.5.0_3.0_1726529647683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + + +spanClassifier = DistilBertForQuestionAnswering.pretrained("results","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([document_assembler, spanClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering + .pretrained("results", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(document_assembler, spanClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|307.0 MB| + +## References + +References + +References + +https://huggingface.co/Souvik123/results \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-results_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-results_pipeline_en.md new file mode 100644 index 00000000000000..b4cb42f94b4a8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-results_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English results_pipeline pipeline RoBertaForTokenClassification from danielyoo +author: John Snow Labs +name: results_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_pipeline` is a English model originally trained by danielyoo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_pipeline_en_5.5.0_3.0_1726529662518.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_pipeline_en_5.5.0_3.0_1726529662518.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("results_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("results_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|307.0 MB| + +## References + +https://huggingface.co/danielyoo/results + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_bne_finetuned_nepal_bhasa_oriya_used_warranty_gpu_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_bne_finetuned_nepal_bhasa_oriya_used_warranty_gpu_en.md new file mode 100644 index 00000000000000..2ffd90817abf4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_bne_finetuned_nepal_bhasa_oriya_used_warranty_gpu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_nepal_bhasa_oriya_used_warranty_gpu RoBertaForSequenceClassification from gonchisi +author: John Snow Labs +name: roberta_base_bne_finetuned_nepal_bhasa_oriya_used_warranty_gpu +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_nepal_bhasa_oriya_used_warranty_gpu` is a English model originally trained by gonchisi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_nepal_bhasa_oriya_used_warranty_gpu_en_5.5.0_3.0_1726470680607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_nepal_bhasa_oriya_used_warranty_gpu_en_5.5.0_3.0_1726470680607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_nepal_bhasa_oriya_used_warranty_gpu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_nepal_bhasa_oriya_used_warranty_gpu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_nepal_bhasa_oriya_used_warranty_gpu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|428.6 MB| + +## References + +https://huggingface.co/gonchisi/roberta-base-bne-finetuned-new_or_used_warranty-gpu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_bne_finetuned_toxicity_tweets_25000_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_bne_finetuned_toxicity_tweets_25000_en.md new file mode 100644 index 00000000000000..dbd6aab03ef4c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_bne_finetuned_toxicity_tweets_25000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_toxicity_tweets_25000 RoBertaForSequenceClassification from tonirodriguez +author: John Snow Labs +name: roberta_base_bne_finetuned_toxicity_tweets_25000 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_toxicity_tweets_25000` is a English model originally trained by tonirodriguez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_toxicity_tweets_25000_en_5.5.0_3.0_1726504568284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_toxicity_tweets_25000_en_5.5.0_3.0_1726504568284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_toxicity_tweets_25000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_toxicity_tweets_25000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_toxicity_tweets_25000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|439.3 MB| + +## References + +https://huggingface.co/tonirodriguez/roberta-base-bne-finetuned-toxicity-tweets-25000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_bne_finetuned_toxicity_tweets_25000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_bne_finetuned_toxicity_tweets_25000_pipeline_en.md new file mode 100644 index 00000000000000..5d41111c486640 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_bne_finetuned_toxicity_tweets_25000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_toxicity_tweets_25000_pipeline pipeline RoBertaForSequenceClassification from tonirodriguez +author: John Snow Labs +name: roberta_base_bne_finetuned_toxicity_tweets_25000_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_toxicity_tweets_25000_pipeline` is a English model originally trained by tonirodriguez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_toxicity_tweets_25000_pipeline_en_5.5.0_3.0_1726504592648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_toxicity_tweets_25000_pipeline_en_5.5.0_3.0_1726504592648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_toxicity_tweets_25000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_toxicity_tweets_25000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_toxicity_tweets_25000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|439.3 MB| + +## References + +https://huggingface.co/tonirodriguez/roberta-base-bne-finetuned-toxicity-tweets-25000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_conversation_classification_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_conversation_classification_en.md new file mode 100644 index 00000000000000..dbcbdf659d600d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_conversation_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_conversation_classification RoBertaForSequenceClassification from muthuramkumar +author: John Snow Labs +name: roberta_base_conversation_classification +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_conversation_classification` is a English model originally trained by muthuramkumar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_conversation_classification_en_5.5.0_3.0_1726518879234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_conversation_classification_en_5.5.0_3.0_1726518879234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_conversation_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_conversation_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_conversation_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|420.0 MB| + +## References + +https://huggingface.co/muthuramkumar/roberta-base-conversation-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_cookdial_v1_1_kedudzic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_cookdial_v1_1_kedudzic_pipeline_en.md new file mode 100644 index 00000000000000..3bf5585e94dd57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_cookdial_v1_1_kedudzic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_cookdial_v1_1_kedudzic_pipeline pipeline RoBertaForSequenceClassification from kedudzic +author: John Snow Labs +name: roberta_base_cookdial_v1_1_kedudzic_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_cookdial_v1_1_kedudzic_pipeline` is a English model originally trained by kedudzic. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_cookdial_v1_1_kedudzic_pipeline_en_5.5.0_3.0_1726528070882.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_cookdial_v1_1_kedudzic_pipeline_en_5.5.0_3.0_1726528070882.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_cookdial_v1_1_kedudzic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_cookdial_v1_1_kedudzic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_cookdial_v1_1_kedudzic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|419.1 MB| + +## References + +https://huggingface.co/kedudzic/roberta-base-cookdial-v1_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_e_snli_classification_nli_explanation_base_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_e_snli_classification_nli_explanation_base_en.md new file mode 100644 index 00000000000000..37840cf34ec92d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_e_snli_classification_nli_explanation_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_e_snli_classification_nli_explanation_base RoBertaForSequenceClassification from k4black +author: John Snow Labs +name: roberta_base_e_snli_classification_nli_explanation_base +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_e_snli_classification_nli_explanation_base` is a English model originally trained by k4black. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_e_snli_classification_nli_explanation_base_en_5.5.0_3.0_1726518527904.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_e_snli_classification_nli_explanation_base_en_5.5.0_3.0_1726518527904.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_e_snli_classification_nli_explanation_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_e_snli_classification_nli_explanation_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_e_snli_classification_nli_explanation_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|452.9 MB| + +## References + +https://huggingface.co/k4black/roberta-base-e-snli-classification-nli_explanation-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_e_snli_classification_nli_explanation_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_e_snli_classification_nli_explanation_base_pipeline_en.md new file mode 100644 index 00000000000000..58fe41f4c2e97d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_e_snli_classification_nli_explanation_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_e_snli_classification_nli_explanation_base_pipeline pipeline RoBertaForSequenceClassification from k4black +author: John Snow Labs +name: roberta_base_e_snli_classification_nli_explanation_base_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_e_snli_classification_nli_explanation_base_pipeline` is a English model originally trained by k4black. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_e_snli_classification_nli_explanation_base_pipeline_en_5.5.0_3.0_1726518555263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_e_snli_classification_nli_explanation_base_pipeline_en_5.5.0_3.0_1726518555263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_e_snli_classification_nli_explanation_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_e_snli_classification_nli_explanation_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_e_snli_classification_nli_explanation_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|452.9 MB| + +## References + +https://huggingface.co/k4black/roberta-base-e-snli-classification-nli_explanation-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_edos_c_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_edos_c_en.md new file mode 100644 index 00000000000000..3d9463bdd8fffb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_edos_c_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_edos_c RoBertaForSequenceClassification from ISEGURA +author: John Snow Labs +name: roberta_base_edos_c +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_edos_c` is a English model originally trained by ISEGURA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_edos_c_en_5.5.0_3.0_1726527559759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_edos_c_en_5.5.0_3.0_1726527559759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_edos_c","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_edos_c", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_edos_c| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.0 MB| + +## References + +https://huggingface.co/ISEGURA/roberta-base_edos_c \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_edos_c_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_edos_c_pipeline_en.md new file mode 100644 index 00000000000000..aac91ca144a432 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_edos_c_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_edos_c_pipeline pipeline RoBertaForSequenceClassification from ISEGURA +author: John Snow Labs +name: roberta_base_edos_c_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_edos_c_pipeline` is a English model originally trained by ISEGURA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_edos_c_pipeline_en_5.5.0_3.0_1726527584811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_edos_c_pipeline_en_5.5.0_3.0_1726527584811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_edos_c_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_edos_c_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_edos_c_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|444.0 MB| + +## References + +https://huggingface.co/ISEGURA/roberta-base_edos_c + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline_en.md new file mode 100644 index 00000000000000..adb9252fb857f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline pipeline RoBertaForSequenceClassification from ben-yu +author: John Snow Labs +name: roberta_base_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline` is a English model originally trained by ben-yu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline_en_5.5.0_3.0_1726518630745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline_en_5.5.0_3.0_1726518630745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|439.3 MB| + +## References + +https://huggingface.co/ben-yu/roberta-base-finetuned-nlp-letters-s1_s2-all-class-weighted + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_finetuned_squad_harveyagraphcore_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_finetuned_squad_harveyagraphcore_en.md new file mode 100644 index 00000000000000..15494c0fca81c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_finetuned_squad_harveyagraphcore_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_finetuned_squad_harveyagraphcore RoBertaForQuestionAnswering from harveyagraphcore +author: John Snow Labs +name: roberta_base_finetuned_squad_harveyagraphcore +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_squad_harveyagraphcore` is a English model originally trained by harveyagraphcore. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_harveyagraphcore_en_5.5.0_3.0_1726501891140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_harveyagraphcore_en_5.5.0_3.0_1726501891140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetuned_squad_harveyagraphcore","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetuned_squad_harveyagraphcore", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_squad_harveyagraphcore| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|298.3 MB| + +## References + +https://huggingface.co/harveyagraphcore/roberta-base-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_finetuned_squad_harveyagraphcore_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_finetuned_squad_harveyagraphcore_pipeline_en.md new file mode 100644 index 00000000000000..3e122ac61afd64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_finetuned_squad_harveyagraphcore_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_base_finetuned_squad_harveyagraphcore_pipeline pipeline RoBertaForQuestionAnswering from harveyagraphcore +author: John Snow Labs +name: roberta_base_finetuned_squad_harveyagraphcore_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_squad_harveyagraphcore_pipeline` is a English model originally trained by harveyagraphcore. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_harveyagraphcore_pipeline_en_5.5.0_3.0_1726501980538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_harveyagraphcore_pipeline_en_5.5.0_3.0_1726501980538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_squad_harveyagraphcore_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_squad_harveyagraphcore_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_squad_harveyagraphcore_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.4 MB| + +## References + +https://huggingface.co/harveyagraphcore/roberta-base-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_hoax_classifier_fulltext_v1_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_hoax_classifier_fulltext_v1_en.md new file mode 100644 index 00000000000000..b36bf045f32758 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_hoax_classifier_fulltext_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_hoax_classifier_fulltext_v1 RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_base_hoax_classifier_fulltext_v1 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hoax_classifier_fulltext_v1` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_fulltext_v1_en_5.5.0_3.0_1726470312770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_fulltext_v1_en_5.5.0_3.0_1726470312770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hoax_classifier_fulltext_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hoax_classifier_fulltext_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hoax_classifier_fulltext_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.1 MB| + +## References + +https://huggingface.co/research-dump/roberta-base_hoax_classifier_fulltext_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_leandroaraujodev_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_leandroaraujodev_pipeline_en.md new file mode 100644 index 00000000000000..8e6f49a88e52c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_leandroaraujodev_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_leandroaraujodev_pipeline pipeline RoBertaForSequenceClassification from leandroaraujodev +author: John Snow Labs +name: roberta_base_leandroaraujodev_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_leandroaraujodev_pipeline` is a English model originally trained by leandroaraujodev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_leandroaraujodev_pipeline_en_5.5.0_3.0_1726527180709.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_leandroaraujodev_pipeline_en_5.5.0_3.0_1726527180709.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_leandroaraujodev_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_leandroaraujodev_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_leandroaraujodev_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.3 MB| + +## References + +https://huggingface.co/leandroaraujodev/roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_squad_model1_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_squad_model1_en.md new file mode 100644 index 00000000000000..8481a07b8067f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_squad_model1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_squad_model1 RoBertaForQuestionAnswering from varun-v-rao +author: John Snow Labs +name: roberta_base_squad_model1 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_squad_model1` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_squad_model1_en_5.5.0_3.0_1726460703004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_squad_model1_en_5.5.0_3.0_1726460703004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_squad_model1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_squad_model1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_squad_model1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|461.9 MB| + +## References + +https://huggingface.co/varun-v-rao/roberta-base-squad-model1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_tweet_topic_single_2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_tweet_topic_single_2020_pipeline_en.md new file mode 100644 index 00000000000000..129432efd6db58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_tweet_topic_single_2020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_tweet_topic_single_2020_pipeline pipeline RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: roberta_base_tweet_topic_single_2020_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_tweet_topic_single_2020_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_tweet_topic_single_2020_pipeline_en_5.5.0_3.0_1726518173810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_tweet_topic_single_2020_pipeline_en_5.5.0_3.0_1726518173810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_tweet_topic_single_2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_tweet_topic_single_2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_tweet_topic_single_2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.4 MB| + +## References + +https://huggingface.co/cardiffnlp/roberta-base-tweet-topic-single-2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_coping_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_coping_en.md new file mode 100644 index 00000000000000..8bacf45ec20da0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_coping_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_coping RoBertaForSequenceClassification from marco-stranisci +author: John Snow Labs +name: roberta_coping +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_coping` is a English model originally trained by marco-stranisci. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_coping_en_5.5.0_3.0_1726527773759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_coping_en_5.5.0_3.0_1726527773759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_coping","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_coping", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_coping| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|421.7 MB| + +## References + +https://huggingface.co/marco-stranisci/roberta-coping \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_freeform_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_freeform_en.md new file mode 100644 index 00000000000000..5fcb54f30d1f05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_freeform_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_finetuned_freeform RoBertaForSequenceClassification from maretamasaeva +author: John Snow Labs +name: roberta_finetuned_freeform +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_freeform` is a English model originally trained by maretamasaeva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_freeform_en_5.5.0_3.0_1726518459684.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_freeform_en_5.5.0_3.0_1726518459684.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_finetuned_freeform","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_finetuned_freeform", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_freeform| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|457.5 MB| + +## References + +https://huggingface.co/maretamasaeva/roberta-finetuned-freeform \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_personality_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_personality_en.md new file mode 100644 index 00000000000000..b9bff8f20814a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_personality_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_finetuned_personality RoBertaForSequenceClassification from Nasserelsaman +author: John Snow Labs +name: roberta_finetuned_personality +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_personality` is a English model originally trained by Nasserelsaman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_personality_en_5.5.0_3.0_1726527127676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_personality_en_5.5.0_3.0_1726527127676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_finetuned_personality","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_finetuned_personality", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_personality| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Nasserelsaman/roberta-finetuned-personality \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_personality_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_personality_pipeline_en.md new file mode 100644 index 00000000000000..c2b09d8e2d9151 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_personality_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_finetuned_personality_pipeline pipeline RoBertaForSequenceClassification from Nasserelsaman +author: John Snow Labs +name: roberta_finetuned_personality_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_personality_pipeline` is a English model originally trained by Nasserelsaman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_personality_pipeline_en_5.5.0_3.0_1726527221389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_personality_pipeline_en_5.5.0_3.0_1726527221389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_personality_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_personality_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_personality_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Nasserelsaman/roberta-finetuned-personality + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_subjqa_movies_2_tamnemtf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_subjqa_movies_2_tamnemtf_pipeline_en.md new file mode 100644 index 00000000000000..41945f18afc5d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_subjqa_movies_2_tamnemtf_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_tamnemtf_pipeline pipeline RoBertaForQuestionAnswering from Tamnemtf +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_tamnemtf_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_tamnemtf_pipeline` is a English model originally trained by Tamnemtf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_tamnemtf_pipeline_en_5.5.0_3.0_1726501708751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_tamnemtf_pipeline_en_5.5.0_3.0_1726501708751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_subjqa_movies_2_tamnemtf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_subjqa_movies_2_tamnemtf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_tamnemtf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/Tamnemtf/roberta-finetuned-subjqa-movies_2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_deletion_multiclass_complete_final_v2_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_deletion_multiclass_complete_final_v2_en.md new file mode 100644 index 00000000000000..e230445f255d88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_deletion_multiclass_complete_final_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_deletion_multiclass_complete_final_v2 RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_large_deletion_multiclass_complete_final_v2 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_deletion_multiclass_complete_final_v2` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_deletion_multiclass_complete_final_v2_en_5.5.0_3.0_1726527913920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_deletion_multiclass_complete_final_v2_en_5.5.0_3.0_1726527913920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_deletion_multiclass_complete_final_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_deletion_multiclass_complete_final_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_deletion_multiclass_complete_final_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta-large_deletion_multiclass_complete_final_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_few_shot_k_512_finetuned_squad_seed_4_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_few_shot_k_512_finetuned_squad_seed_4_en.md new file mode 100644 index 00000000000000..de0b8eba2ab0b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_few_shot_k_512_finetuned_squad_seed_4_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_large_few_shot_k_512_finetuned_squad_seed_4 RoBertaForQuestionAnswering from anas-awadalla +author: John Snow Labs +name: roberta_large_few_shot_k_512_finetuned_squad_seed_4 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_few_shot_k_512_finetuned_squad_seed_4` is a English model originally trained by anas-awadalla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_few_shot_k_512_finetuned_squad_seed_4_en_5.5.0_3.0_1726501654536.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_few_shot_k_512_finetuned_squad_seed_4_en_5.5.0_3.0_1726501654536.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_few_shot_k_512_finetuned_squad_seed_4","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_few_shot_k_512_finetuned_squad_seed_4", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_few_shot_k_512_finetuned_squad_seed_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/anas-awadalla/roberta-large-few-shot-k-512-finetuned-squad-seed-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_finetuned_abbr_weightdecay0_0001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_finetuned_abbr_weightdecay0_0001_pipeline_en.md new file mode 100644 index 00000000000000..3855a09370450a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_finetuned_abbr_weightdecay0_0001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_abbr_weightdecay0_0001_pipeline pipeline RoBertaForTokenClassification from karsimkh +author: John Snow Labs +name: roberta_large_finetuned_abbr_weightdecay0_0001_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_abbr_weightdecay0_0001_pipeline` is a English model originally trained by karsimkh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abbr_weightdecay0_0001_pipeline_en_5.5.0_3.0_1726452869290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abbr_weightdecay0_0001_pipeline_en_5.5.0_3.0_1726452869290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_abbr_weightdecay0_0001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_abbr_weightdecay0_0001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_abbr_weightdecay0_0001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/karsimkh/roberta-large-finetuned-abbr-WeightDecay0.0001 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_finetuned_cola_cvapict_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_finetuned_cola_cvapict_en.md new file mode 100644 index 00000000000000..bd5084686a65e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_finetuned_cola_cvapict_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_cola_cvapict RoBertaForSequenceClassification from cvapict +author: John Snow Labs +name: roberta_large_finetuned_cola_cvapict +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_cola_cvapict` is a English model originally trained by cvapict. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_cola_cvapict_en_5.5.0_3.0_1726471046077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_cola_cvapict_en_5.5.0_3.0_1726471046077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_cola_cvapict","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_cola_cvapict", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_cola_cvapict| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/cvapict/roberta-large-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_go_emotions_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_go_emotions_v3_pipeline_en.md new file mode 100644 index 00000000000000..d59e2090faa7c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_go_emotions_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_go_emotions_v3_pipeline pipeline RoBertaForSequenceClassification from Prasadrao +author: John Snow Labs +name: roberta_large_go_emotions_v3_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_go_emotions_v3_pipeline` is a English model originally trained by Prasadrao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_go_emotions_v3_pipeline_en_5.5.0_3.0_1726505126878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_go_emotions_v3_pipeline_en_5.5.0_3.0_1726505126878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_go_emotions_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_go_emotions_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_go_emotions_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/Prasadrao/roberta-large-go-emotions_v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_lora_1_57m_squad_model3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_lora_1_57m_squad_model3_pipeline_en.md new file mode 100644 index 00000000000000..85822aba2d3297 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_lora_1_57m_squad_model3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_large_lora_1_57m_squad_model3_pipeline pipeline RoBertaForQuestionAnswering from varun-v-rao +author: John Snow Labs +name: roberta_large_lora_1_57m_squad_model3_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_lora_1_57m_squad_model3_pipeline` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_lora_1_57m_squad_model3_pipeline_en_5.5.0_3.0_1726502416508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_lora_1_57m_squad_model3_pipeline_en_5.5.0_3.0_1726502416508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_lora_1_57m_squad_model3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_lora_1_57m_squad_model3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_lora_1_57m_squad_model3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|911.1 MB| + +## References + +https://huggingface.co/varun-v-rao/roberta-large-lora-1.57M-squad-model3 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_mnli_fld_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_mnli_fld_pipeline_en.md new file mode 100644 index 00000000000000..8c57f2db2a6aa3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_mnli_fld_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_mnli_fld_pipeline pipeline RoBertaForSequenceClassification from cestwc +author: John Snow Labs +name: roberta_large_mnli_fld_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_mnli_fld_pipeline` is a English model originally trained by cestwc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_fld_pipeline_en_5.5.0_3.0_1726470683362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_fld_pipeline_en_5.5.0_3.0_1726470683362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_mnli_fld_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_mnli_fld_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_mnli_fld_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/cestwc/roberta-large-mnli-fld + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_mnli_ricechem_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_mnli_ricechem_en.md new file mode 100644 index 00000000000000..9acd2d76b9e07e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_mnli_ricechem_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_mnli_ricechem RoBertaForSequenceClassification from kangqi-ni +author: John Snow Labs +name: roberta_large_mnli_ricechem +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_mnli_ricechem` is a English model originally trained by kangqi-ni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_ricechem_en_5.5.0_3.0_1726505178666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_ricechem_en_5.5.0_3.0_1726505178666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_mnli_ricechem","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_mnli_ricechem", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_mnli_ricechem| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kangqi-ni/roberta-large-mnli-ricechem \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_mnli_ricechem_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_mnli_ricechem_pipeline_en.md new file mode 100644 index 00000000000000..772482c40ee6a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_mnli_ricechem_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_mnli_ricechem_pipeline pipeline RoBertaForSequenceClassification from kangqi-ni +author: John Snow Labs +name: roberta_large_mnli_ricechem_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_mnli_ricechem_pipeline` is a English model originally trained by kangqi-ni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_ricechem_pipeline_en_5.5.0_3.0_1726505259999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_ricechem_pipeline_en_5.5.0_3.0_1726505259999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_mnli_ricechem_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_mnli_ricechem_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_mnli_ricechem_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kangqi-ni/roberta-large-mnli-ricechem + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_oscar_chen_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_oscar_chen_en.md new file mode 100644 index 00000000000000..ef2b64d90254e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_oscar_chen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_oscar_chen RoBertaForSequenceClassification from Oscar-chen +author: John Snow Labs +name: roberta_large_oscar_chen +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_oscar_chen` is a English model originally trained by Oscar-chen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_oscar_chen_en_5.5.0_3.0_1726519270659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_oscar_chen_en_5.5.0_3.0_1726519270659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_oscar_chen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_oscar_chen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_oscar_chen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Oscar-chen/roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_oscar_chen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_oscar_chen_pipeline_en.md new file mode 100644 index 00000000000000..2344dd6ec87ec1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_oscar_chen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_oscar_chen_pipeline pipeline RoBertaForSequenceClassification from Oscar-chen +author: John Snow Labs +name: roberta_large_oscar_chen_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_oscar_chen_pipeline` is a English model originally trained by Oscar-chen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_oscar_chen_pipeline_en_5.5.0_3.0_1726519343257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_oscar_chen_pipeline_en_5.5.0_3.0_1726519343257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_oscar_chen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_oscar_chen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_oscar_chen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Oscar-chen/roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_squad2_fine_tuned_9e_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_squad2_fine_tuned_9e_en.md new file mode 100644 index 00000000000000..d178e944580e81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_squad2_fine_tuned_9e_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_large_squad2_fine_tuned_9e RoBertaForQuestionAnswering from marwanimroz18 +author: John Snow Labs +name: roberta_large_squad2_fine_tuned_9e +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_squad2_fine_tuned_9e` is a English model originally trained by marwanimroz18. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_squad2_fine_tuned_9e_en_5.5.0_3.0_1726502000501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_squad2_fine_tuned_9e_en_5.5.0_3.0_1726502000501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_squad2_fine_tuned_9e","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_squad2_fine_tuned_9e", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_squad2_fine_tuned_9e| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/marwanimroz18/roberta-large-squad2-fine-tuned-9e \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_sst_2_32_13_30_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_sst_2_32_13_30_pipeline_en.md new file mode 100644 index 00000000000000..e25d87d712326d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_sst_2_32_13_30_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_sst_2_32_13_30_pipeline pipeline RoBertaForSequenceClassification from simonycl +author: John Snow Labs +name: roberta_large_sst_2_32_13_30_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_sst_2_32_13_30_pipeline` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_sst_2_32_13_30_pipeline_en_5.5.0_3.0_1726527640208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_sst_2_32_13_30_pipeline_en_5.5.0_3.0_1726527640208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_sst_2_32_13_30_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_sst_2_32_13_30_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_sst_2_32_13_30_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/simonycl/roberta-large-sst-2-32-13-30 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_llm_classfier_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_llm_classfier_en.md new file mode 100644 index 00000000000000..cfdca9e2da1c3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_llm_classfier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_llm_classfier RoBertaForSequenceClassification from bayesmaxxer +author: John Snow Labs +name: roberta_llm_classfier +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_llm_classfier` is a English model originally trained by bayesmaxxer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_llm_classfier_en_5.5.0_3.0_1726518228862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_llm_classfier_en_5.5.0_3.0_1726518228862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_llm_classfier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_llm_classfier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_llm_classfier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.8 MB| + +## References + +https://huggingface.co/bayesmaxxer/roberta-llm-classfier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_llm_classfier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_llm_classfier_pipeline_en.md new file mode 100644 index 00000000000000..bb2ac3073a8692 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_llm_classfier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_llm_classfier_pipeline pipeline RoBertaForSequenceClassification from bayesmaxxer +author: John Snow Labs +name: roberta_llm_classfier_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_llm_classfier_pipeline` is a English model originally trained by bayesmaxxer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_llm_classfier_pipeline_en_5.5.0_3.0_1726518252416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_llm_classfier_pipeline_en_5.5.0_3.0_1726518252416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_llm_classfier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_llm_classfier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_llm_classfier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|444.8 MB| + +## References + +https://huggingface.co/bayesmaxxer/roberta-llm-classfier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_mixin_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_mixin_en.md new file mode 100644 index 00000000000000..199567fd91304d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_mixin_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_mixin RoBertaForSequenceClassification from OlegOrwell +author: John Snow Labs +name: roberta_mixin +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_mixin` is a English model originally trained by OlegOrwell. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_mixin_en_5.5.0_3.0_1726518393807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_mixin_en_5.5.0_3.0_1726518393807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_mixin","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_mixin", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_mixin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|415.5 MB| + +## References + +https://huggingface.co/OlegOrwell/roberta_mixin \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_mixin_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_mixin_pipeline_en.md new file mode 100644 index 00000000000000..3efd65631f376d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_mixin_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_mixin_pipeline pipeline RoBertaForSequenceClassification from OlegOrwell +author: John Snow Labs +name: roberta_mixin_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_mixin_pipeline` is a English model originally trained by OlegOrwell. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_mixin_pipeline_en_5.5.0_3.0_1726518437960.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_mixin_pipeline_en_5.5.0_3.0_1726518437960.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_mixin_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_mixin_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_mixin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.6 MB| + +## References + +https://huggingface.co/OlegOrwell/roberta_mixin + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_tweet_eval_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_tweet_eval_finetuned_en.md new file mode 100644 index 00000000000000..b1876905c561c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_tweet_eval_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tweet_eval_finetuned RoBertaForSequenceClassification from cruiser +author: John Snow Labs +name: roberta_tweet_eval_finetuned +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tweet_eval_finetuned` is a English model originally trained by cruiser. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tweet_eval_finetuned_en_5.5.0_3.0_1726527298512.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tweet_eval_finetuned_en_5.5.0_3.0_1726527298512.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_tweet_eval_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_tweet_eval_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tweet_eval_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|451.1 MB| + +## References + +https://huggingface.co/cruiser/roberta_tweet_eval_finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_untrained_3eps_seed590_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_untrained_3eps_seed590_en.md new file mode 100644 index 00000000000000..92fbfb6a79e67e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_untrained_3eps_seed590_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_untrained_3eps_seed590 RoBertaForSequenceClassification from custeau +author: John Snow Labs +name: roberta_untrained_3eps_seed590 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_untrained_3eps_seed590` is a English model originally trained by custeau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_untrained_3eps_seed590_en_5.5.0_3.0_1726504471908.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_untrained_3eps_seed590_en_5.5.0_3.0_1726504471908.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_untrained_3eps_seed590","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_untrained_3eps_seed590", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_untrained_3eps_seed590| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|447.9 MB| + +## References + +https://huggingface.co/custeau/roberta_untrained_3eps_seed590 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_untrained_3eps_seed590_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_untrained_3eps_seed590_pipeline_en.md new file mode 100644 index 00000000000000..9d2b996ec38762 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_untrained_3eps_seed590_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_untrained_3eps_seed590_pipeline pipeline RoBertaForSequenceClassification from custeau +author: John Snow Labs +name: roberta_untrained_3eps_seed590_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_untrained_3eps_seed590_pipeline` is a English model originally trained by custeau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_untrained_3eps_seed590_pipeline_en_5.5.0_3.0_1726504500157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_untrained_3eps_seed590_pipeline_en_5.5.0_3.0_1726504500157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_untrained_3eps_seed590_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_untrained_3eps_seed590_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_untrained_3eps_seed590_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|447.9 MB| + +## References + +https://huggingface.co/custeau/roberta_untrained_3eps_seed590 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-rubert_base_cased_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-rubert_base_cased_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..56bfbf989c5443 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-rubert_base_cased_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English rubert_base_cased_finetuned_squad_pipeline pipeline BertForQuestionAnswering from KirrAno93 +author: John Snow Labs +name: rubert_base_cased_finetuned_squad_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_base_cased_finetuned_squad_pipeline` is a English model originally trained by KirrAno93. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_base_cased_finetuned_squad_pipeline_en_5.5.0_3.0_1726490041094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_base_cased_finetuned_squad_pipeline_en_5.5.0_3.0_1726490041094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_base_cased_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_base_cased_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_base_cased_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|664.3 MB| + +## References + +https://huggingface.co/KirrAno93/rubert-base-cased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-rubert_tiny2_russian_emotion_detection_ru.md b/docs/_posts/ahmedlone127/2024-09-16-rubert_tiny2_russian_emotion_detection_ru.md new file mode 100644 index 00000000000000..e79da15728efb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-rubert_tiny2_russian_emotion_detection_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian rubert_tiny2_russian_emotion_detection BertForSequenceClassification from Djacon +author: John Snow Labs +name: rubert_tiny2_russian_emotion_detection +date: 2024-09-16 +tags: [ru, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny2_russian_emotion_detection` is a Russian model originally trained by Djacon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny2_russian_emotion_detection_ru_5.5.0_3.0_1726459113177.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny2_russian_emotion_detection_ru_5.5.0_3.0_1726459113177.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("rubert_tiny2_russian_emotion_detection","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("rubert_tiny2_russian_emotion_detection", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny2_russian_emotion_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ru| +|Size:|109.5 MB| + +## References + +https://huggingface.co/Djacon/rubert-tiny2-russian-emotion-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-salamathanksfil2env3_en.md b/docs/_posts/ahmedlone127/2024-09-16-salamathanksfil2env3_en.md new file mode 100644 index 00000000000000..d129fa6c3e87a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-salamathanksfil2env3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English salamathanksfil2env3 MarianTransformer from jimacasaet +author: John Snow Labs +name: salamathanksfil2env3 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`salamathanksfil2env3` is a English model originally trained by jimacasaet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/salamathanksfil2env3_en_5.5.0_3.0_1726490981307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/salamathanksfil2env3_en_5.5.0_3.0_1726490981307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("salamathanksfil2env3","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("salamathanksfil2env3","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|salamathanksfil2env3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|496.5 MB| + +## References + +https://huggingface.co/jimacasaet/SalamaThanksFIL2ENv3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-salamathankstransformer_en2fil_v1_en.md b/docs/_posts/ahmedlone127/2024-09-16-salamathankstransformer_en2fil_v1_en.md new file mode 100644 index 00000000000000..a7f52bfc19b6f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-salamathankstransformer_en2fil_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English salamathankstransformer_en2fil_v1 MarianTransformer from SalamaThanks +author: John Snow Labs +name: salamathankstransformer_en2fil_v1 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`salamathankstransformer_en2fil_v1` is a English model originally trained by SalamaThanks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/salamathankstransformer_en2fil_v1_en_5.5.0_3.0_1726509742248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/salamathankstransformer_en2fil_v1_en_5.5.0_3.0_1726509742248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("salamathankstransformer_en2fil_v1","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("salamathankstransformer_en2fil_v1","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|salamathankstransformer_en2fil_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|496.7 MB| + +## References + +https://huggingface.co/SalamaThanks/SalamaThanksTransformer_en2fil_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-salamathankstransformer_en2fil_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-salamathankstransformer_en2fil_v1_pipeline_en.md new file mode 100644 index 00000000000000..fa105df71b50d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-salamathankstransformer_en2fil_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English salamathankstransformer_en2fil_v1_pipeline pipeline MarianTransformer from SalamaThanks +author: John Snow Labs +name: salamathankstransformer_en2fil_v1_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`salamathankstransformer_en2fil_v1_pipeline` is a English model originally trained by SalamaThanks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/salamathankstransformer_en2fil_v1_pipeline_en_5.5.0_3.0_1726509768059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/salamathankstransformer_en2fil_v1_pipeline_en_5.5.0_3.0_1726509768059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("salamathankstransformer_en2fil_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("salamathankstransformer_en2fil_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|salamathankstransformer_en2fil_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|497.3 MB| + +## References + +https://huggingface.co/SalamaThanks/SalamaThanksTransformer_en2fil_v1 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_en.md b/docs/_posts/ahmedlone127/2024-09-16-sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_en.md new file mode 100644 index 00000000000000..ffae869ee93f14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sanskrit_saskta_tweet_roberta_large_e12_w1_1_5 RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: sanskrit_saskta_tweet_roberta_large_e12_w1_1_5 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sanskrit_saskta_tweet_roberta_large_e12_w1_1_5` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_en_5.5.0_3.0_1726519117400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_en_5.5.0_3.0_1726519117400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sanskrit_saskta_tweet_roberta_large_e12_w1_1_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sanskrit_saskta_tweet_roberta_large_e12_w1_1_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sanskrit_saskta_tweet_roberta_large_e12_w1_1_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JerryYanJiang/SA-tweet-roberta-large-e12-w1-1.5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_pipeline_en.md new file mode 100644 index 00000000000000..26bbc06a4261ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_pipeline pipeline RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_pipeline` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_pipeline_en_5.5.0_3.0_1726519182150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_pipeline_en_5.5.0_3.0_1726519182150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sanskrit_saskta_tweet_roberta_large_e12_w1_1_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JerryYanJiang/SA-tweet-roberta-large-e12-w1-1.5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_bangla_bert_bn.md b/docs/_posts/ahmedlone127/2024-09-16-sent_bangla_bert_bn.md new file mode 100644 index 00000000000000..b73f7d84d375f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_bangla_bert_bn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bengali sent_bangla_bert BertSentenceEmbeddings from Kowsher +author: John Snow Labs +name: sent_bangla_bert +date: 2024-09-16 +tags: [bn, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bangla_bert` is a Bengali model originally trained by Kowsher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bangla_bert_bn_5.5.0_3.0_1726500624073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bangla_bert_bn_5.5.0_3.0_1726500624073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bangla_bert","bn") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bangla_bert","bn") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bangla_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|bn| +|Size:|612.1 MB| + +## References + +https://huggingface.co/Kowsher/bangla-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_bangla_bert_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-16-sent_bangla_bert_pipeline_bn.md new file mode 100644 index 00000000000000..fa971e390a8cf2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_bangla_bert_pipeline_bn.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Bengali sent_bangla_bert_pipeline pipeline BertSentenceEmbeddings from Kowsher +author: John Snow Labs +name: sent_bangla_bert_pipeline +date: 2024-09-16 +tags: [bn, open_source, pipeline, onnx] +task: Embeddings +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bangla_bert_pipeline` is a Bengali model originally trained by Kowsher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bangla_bert_pipeline_bn_5.5.0_3.0_1726500652319.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bangla_bert_pipeline_bn_5.5.0_3.0_1726500652319.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bangla_bert_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bangla_bert_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bangla_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|612.7 MB| + +## References + +https://huggingface.co/Kowsher/bangla-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_bert_korean_small_ko.md b/docs/_posts/ahmedlone127/2024-09-16-sent_bert_korean_small_ko.md new file mode 100644 index 00000000000000..069980582a88cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_bert_korean_small_ko.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Korean sent_bert_korean_small BertSentenceEmbeddings from lassl +author: John Snow Labs +name: sent_bert_korean_small +date: 2024-09-16 +tags: [ko, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_korean_small` is a Korean model originally trained by lassl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_korean_small_ko_5.5.0_3.0_1726522116325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_korean_small_ko_5.5.0_3.0_1726522116325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_korean_small","ko") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_korean_small","ko") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_korean_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ko| +|Size:|85.6 MB| + +## References + +https://huggingface.co/lassl/bert-ko-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_bert_korean_small_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-16-sent_bert_korean_small_pipeline_ko.md new file mode 100644 index 00000000000000..2a8517ea04259a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_bert_korean_small_pipeline_ko.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Korean sent_bert_korean_small_pipeline pipeline BertSentenceEmbeddings from lassl +author: John Snow Labs +name: sent_bert_korean_small_pipeline +date: 2024-09-16 +tags: [ko, open_source, pipeline, onnx] +task: Embeddings +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_korean_small_pipeline` is a Korean model originally trained by lassl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_korean_small_pipeline_ko_5.5.0_3.0_1726522120748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_korean_small_pipeline_ko_5.5.0_3.0_1726522120748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_korean_small_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_korean_small_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_korean_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|86.2 MB| + +## References + +https://huggingface.co/lassl/bert-ko-small + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_bert_next_word_prediction_en.md b/docs/_posts/ahmedlone127/2024-09-16-sent_bert_next_word_prediction_en.md new file mode 100644 index 00000000000000..5d992f34ec6357 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_bert_next_word_prediction_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_next_word_prediction BertSentenceEmbeddings from MattNandavong +author: John Snow Labs +name: sent_bert_next_word_prediction +date: 2024-09-16 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_next_word_prediction` is a English model originally trained by MattNandavong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_next_word_prediction_en_5.5.0_3.0_1726528704984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_next_word_prediction_en_5.5.0_3.0_1726528704984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_next_word_prediction","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_next_word_prediction","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_next_word_prediction| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/MattNandavong/bert-next-word-prediction \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_bowdpr_marco_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-sent_bowdpr_marco_pipeline_en.md new file mode 100644 index 00000000000000..4614074c3afda2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_bowdpr_marco_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bowdpr_marco_pipeline pipeline BertSentenceEmbeddings from bowdpr +author: John Snow Labs +name: sent_bowdpr_marco_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bowdpr_marco_pipeline` is a English model originally trained by bowdpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bowdpr_marco_pipeline_en_5.5.0_3.0_1726500506416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bowdpr_marco_pipeline_en_5.5.0_3.0_1726500506416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bowdpr_marco_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bowdpr_marco_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bowdpr_marco_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.7 MB| + +## References + +https://huggingface.co/bowdpr/bowdpr_marco + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_morrbert_en.md b/docs/_posts/ahmedlone127/2024-09-16-sent_morrbert_en.md new file mode 100644 index 00000000000000..3910782e26e48c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_morrbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_morrbert BertSentenceEmbeddings from otmangi +author: John Snow Labs +name: sent_morrbert +date: 2024-09-16 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_morrbert` is a English model originally trained by otmangi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_morrbert_en_5.5.0_3.0_1726522730105.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_morrbert_en_5.5.0_3.0_1726522730105.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_morrbert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_morrbert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_morrbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|469.9 MB| + +## References + +https://huggingface.co/otmangi/MorrBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_roberta_base_culinary_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-sent_roberta_base_culinary_pipeline_en.md new file mode 100644 index 00000000000000..a037cae743b34d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_roberta_base_culinary_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_roberta_base_culinary_pipeline pipeline BertSentenceEmbeddings from juancavallotti +author: John Snow Labs +name: sent_roberta_base_culinary_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_roberta_base_culinary_pipeline` is a English model originally trained by juancavallotti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_roberta_base_culinary_pipeline_en_5.5.0_3.0_1726528857450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_roberta_base_culinary_pipeline_en_5.5.0_3.0_1726528857450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_roberta_base_culinary_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_roberta_base_culinary_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_roberta_base_culinary_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.1 MB| + +## References + +https://huggingface.co/juancavallotti/roberta-base-culinary + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sentiment_analysis_on_covid_tweets_en.md b/docs/_posts/ahmedlone127/2024-09-16-sentiment_analysis_on_covid_tweets_en.md new file mode 100644 index 00000000000000..fac81a15f35344 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sentiment_analysis_on_covid_tweets_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_on_covid_tweets RoBertaForSequenceClassification from AmpomahChief +author: John Snow Labs +name: sentiment_analysis_on_covid_tweets +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_on_covid_tweets` is a English model originally trained by AmpomahChief. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_on_covid_tweets_en_5.5.0_3.0_1726456066088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_on_covid_tweets_en_5.5.0_3.0_1726456066088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_on_covid_tweets","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_on_covid_tweets", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_on_covid_tweets| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/AmpomahChief/sentiment_analysis_on_covid_tweets \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-service_categorizer_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-service_categorizer_v2_pipeline_en.md new file mode 100644 index 00000000000000..79bd91f70ec7a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-service_categorizer_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English service_categorizer_v2_pipeline pipeline BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: service_categorizer_v2_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`service_categorizer_v2_pipeline` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/service_categorizer_v2_pipeline_en_5.5.0_3.0_1726459264646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/service_categorizer_v2_pipeline_en_5.5.0_3.0_1726459264646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("service_categorizer_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("service_categorizer_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|service_categorizer_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|666.6 MB| + +## References + +https://huggingface.co/etadevosyan/service_categorizer_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-spanish_english_copied_en.md b/docs/_posts/ahmedlone127/2024-09-16-spanish_english_copied_en.md new file mode 100644 index 00000000000000..70ba250b4ba7ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-spanish_english_copied_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spanish_english_copied MarianTransformer from americasnlp-lct-ehu +author: John Snow Labs +name: spanish_english_copied +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanish_english_copied` is a English model originally trained by americasnlp-lct-ehu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanish_english_copied_en_5.5.0_3.0_1726503029661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanish_english_copied_en_5.5.0_3.0_1726503029661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("spanish_english_copied","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("spanish_english_copied","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanish_english_copied| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|539.2 MB| + +## References + +https://huggingface.co/americasnlp-lct-ehu/es_en_copied \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-spanish_english_copied_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-spanish_english_copied_pipeline_en.md new file mode 100644 index 00000000000000..f02b1568bf3ca7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-spanish_english_copied_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spanish_english_copied_pipeline pipeline MarianTransformer from americasnlp-lct-ehu +author: John Snow Labs +name: spanish_english_copied_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanish_english_copied_pipeline` is a English model originally trained by americasnlp-lct-ehu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanish_english_copied_pipeline_en_5.5.0_3.0_1726503060244.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanish_english_copied_pipeline_en_5.5.0_3.0_1726503060244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spanish_english_copied_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spanish_english_copied_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanish_english_copied_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|539.7 MB| + +## References + +https://huggingface.co/americasnlp-lct-ehu/es_en_copied + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-spanish_portuguese_en.md b/docs/_posts/ahmedlone127/2024-09-16-spanish_portuguese_en.md new file mode 100644 index 00000000000000..87fd325a3cfeb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-spanish_portuguese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spanish_portuguese MarianTransformer from Ife +author: John Snow Labs +name: spanish_portuguese +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanish_portuguese` is a English model originally trained by Ife. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanish_portuguese_en_5.5.0_3.0_1726490809464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanish_portuguese_en_5.5.0_3.0_1726490809464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("spanish_portuguese","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("spanish_portuguese","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanish_portuguese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/Ife/ES-PT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-squad_a_only_train5_en.md b/docs/_posts/ahmedlone127/2024-09-16-squad_a_only_train5_en.md new file mode 100644 index 00000000000000..f007906d126814 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-squad_a_only_train5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English squad_a_only_train5 RoBertaForSequenceClassification from grakshit +author: John Snow Labs +name: squad_a_only_train5 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squad_a_only_train5` is a English model originally trained by grakshit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squad_a_only_train5_en_5.5.0_3.0_1726504804475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squad_a_only_train5_en_5.5.0_3.0_1726504804475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("squad_a_only_train5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("squad_a_only_train5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squad_a_only_train5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|446.4 MB| + +## References + +https://huggingface.co/grakshit/squad_a_only_train5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-squad_a_only_train5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-squad_a_only_train5_pipeline_en.md new file mode 100644 index 00000000000000..ee90c9a1c9b0c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-squad_a_only_train5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English squad_a_only_train5_pipeline pipeline RoBertaForSequenceClassification from grakshit +author: John Snow Labs +name: squad_a_only_train5_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squad_a_only_train5_pipeline` is a English model originally trained by grakshit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squad_a_only_train5_pipeline_en_5.5.0_3.0_1726504831378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squad_a_only_train5_pipeline_en_5.5.0_3.0_1726504831378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("squad_a_only_train5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("squad_a_only_train5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squad_a_only_train5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|446.5 MB| + +## References + +https://huggingface.co/grakshit/squad_a_only_train5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-test_trainer_allevelly_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-test_trainer_allevelly_pipeline_en.md new file mode 100644 index 00000000000000..035ebe33d097c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-test_trainer_allevelly_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_trainer_allevelly_pipeline pipeline RoBertaForSequenceClassification from allevelly +author: John Snow Labs +name: test_trainer_allevelly_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer_allevelly_pipeline` is a English model originally trained by allevelly. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer_allevelly_pipeline_en_5.5.0_3.0_1726504481392.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer_allevelly_pipeline_en_5.5.0_3.0_1726504481392.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_trainer_allevelly_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_trainer_allevelly_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer_allevelly_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/allevelly/test_trainer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_en.md b/docs/_posts/ahmedlone127/2024-09-16-text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_en.md new file mode 100644 index 00000000000000..58603b6cc31fd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_subject_classification_distilbert_base_uncased_single_label_pdf_zg DistilBertForSequenceClassification from acuvity +author: John Snow Labs +name: text_subject_classification_distilbert_base_uncased_single_label_pdf_zg +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_subject_classification_distilbert_base_uncased_single_label_pdf_zg` is a English model originally trained by acuvity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_en_5.5.0_3.0_1726506403558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_en_5.5.0_3.0_1726506403558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_subject_classification_distilbert_base_uncased_single_label_pdf_zg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_subject_classification_distilbert_base_uncased_single_label_pdf_zg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_subject_classification_distilbert_base_uncased_single_label_pdf_zg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/acuvity/text-subject_classification-distilbert-base-uncased-single_label-pdf-zg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_pipeline_en.md new file mode 100644 index 00000000000000..e58a8b2b53315b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_pipeline pipeline DistilBertForSequenceClassification from acuvity +author: John Snow Labs +name: text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_pipeline` is a English model originally trained by acuvity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_pipeline_en_5.5.0_3.0_1726506416578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_pipeline_en_5.5.0_3.0_1726506416578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_subject_classification_distilbert_base_uncased_single_label_pdf_zg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/acuvity/text-subject_classification-distilbert-base-uncased-single_label-pdf-zg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-tinybert_tajik_hs_hx_parentpretrained_en.md b/docs/_posts/ahmedlone127/2024-09-16-tinybert_tajik_hs_hx_parentpretrained_en.md new file mode 100644 index 00000000000000..6e1a7def55f838 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-tinybert_tajik_hs_hx_parentpretrained_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tinybert_tajik_hs_hx_parentpretrained BertForSequenceClassification from joseph10 +author: John Snow Labs +name: tinybert_tajik_hs_hx_parentpretrained +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinybert_tajik_hs_hx_parentpretrained` is a English model originally trained by joseph10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinybert_tajik_hs_hx_parentpretrained_en_5.5.0_3.0_1726498591053.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinybert_tajik_hs_hx_parentpretrained_en_5.5.0_3.0_1726498591053.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tinybert_tajik_hs_hx_parentpretrained","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tinybert_tajik_hs_hx_parentpretrained", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinybert_tajik_hs_hx_parentpretrained| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|42.1 MB| + +## References + +https://huggingface.co/joseph10/tinybert-TG-HS-HX-parentpretrained \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-tinybert_tajik_hs_hx_parentpretrained_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-tinybert_tajik_hs_hx_parentpretrained_pipeline_en.md new file mode 100644 index 00000000000000..4c0ad1659ed5bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-tinybert_tajik_hs_hx_parentpretrained_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tinybert_tajik_hs_hx_parentpretrained_pipeline pipeline BertForSequenceClassification from joseph10 +author: John Snow Labs +name: tinybert_tajik_hs_hx_parentpretrained_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinybert_tajik_hs_hx_parentpretrained_pipeline` is a English model originally trained by joseph10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinybert_tajik_hs_hx_parentpretrained_pipeline_en_5.5.0_3.0_1726498593558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinybert_tajik_hs_hx_parentpretrained_pipeline_en_5.5.0_3.0_1726498593558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tinybert_tajik_hs_hx_parentpretrained_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tinybert_tajik_hs_hx_parentpretrained_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinybert_tajik_hs_hx_parentpretrained_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.1 MB| + +## References + +https://huggingface.co/joseph10/tinybert-TG-HS-HX-parentpretrained + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-tner_xlm_roberta_base_uncased_all_english_finetuned_rte_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-tner_xlm_roberta_base_uncased_all_english_finetuned_rte_pipeline_en.md new file mode 100644 index 00000000000000..09638ab92edf7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-tner_xlm_roberta_base_uncased_all_english_finetuned_rte_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tner_xlm_roberta_base_uncased_all_english_finetuned_rte_pipeline pipeline XlmRoBertaForSequenceClassification from anamelchor +author: John Snow Labs +name: tner_xlm_roberta_base_uncased_all_english_finetuned_rte_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_uncased_all_english_finetuned_rte_pipeline` is a English model originally trained by anamelchor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_uncased_all_english_finetuned_rte_pipeline_en_5.5.0_3.0_1726516718656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_uncased_all_english_finetuned_rte_pipeline_en_5.5.0_3.0_1726516718656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tner_xlm_roberta_base_uncased_all_english_finetuned_rte_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tner_xlm_roberta_base_uncased_all_english_finetuned_rte_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_uncased_all_english_finetuned_rte_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.2 MB| + +## References + +https://huggingface.co/anamelchor/tner-xlm-roberta-base-uncased-all-english-finetuned-rte + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-tofu_forget10_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-tofu_forget10_classifier_pipeline_en.md new file mode 100644 index 00000000000000..1c70881aab5613 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-tofu_forget10_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tofu_forget10_classifier_pipeline pipeline RoBertaForSequenceClassification from chrisliu298 +author: John Snow Labs +name: tofu_forget10_classifier_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tofu_forget10_classifier_pipeline` is a English model originally trained by chrisliu298. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tofu_forget10_classifier_pipeline_en_5.5.0_3.0_1726505078339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tofu_forget10_classifier_pipeline_en_5.5.0_3.0_1726505078339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tofu_forget10_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tofu_forget10_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tofu_forget10_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|433.1 MB| + +## References + +https://huggingface.co/chrisliu298/tofu_forget10_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-topic_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-topic_classification_pipeline_en.md new file mode 100644 index 00000000000000..9e14ddc056c450 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-topic_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English topic_classification_pipeline pipeline DistilBertForSequenceClassification from langwnwk +author: John Snow Labs +name: topic_classification_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_classification_pipeline` is a English model originally trained by langwnwk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_classification_pipeline_en_5.5.0_3.0_1726525845957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_classification_pipeline_en_5.5.0_3.0_1726525845957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("topic_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("topic_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/langwnwk/topic_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-toxicity_judge_en.md b/docs/_posts/ahmedlone127/2024-09-16-toxicity_judge_en.md new file mode 100644 index 00000000000000..467eeee2292aa3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-toxicity_judge_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English toxicity_judge DistilBertForSequenceClassification from yeshiovo +author: John Snow Labs +name: toxicity_judge +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxicity_judge` is a English model originally trained by yeshiovo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxicity_judge_en_5.5.0_3.0_1726506374292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxicity_judge_en_5.5.0_3.0_1726506374292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("toxicity_judge","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("toxicity_judge", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxicity_judge| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yeshiovo/toxicity_judge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-trainer_4f_en.md b/docs/_posts/ahmedlone127/2024-09-16-trainer_4f_en.md new file mode 100644 index 00000000000000..ebb63b8406ea56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-trainer_4f_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trainer_4f DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainer_4f +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer_4f` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer_4f_en_5.5.0_3.0_1726525794600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer_4f_en_5.5.0_3.0_1726525794600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer_4f","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer_4f", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer_4f| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainer_4f \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-translate_model_fixed_v0_1_en.md b/docs/_posts/ahmedlone127/2024-09-16-translate_model_fixed_v0_1_en.md new file mode 100644 index 00000000000000..5318cfc975faa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-translate_model_fixed_v0_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English translate_model_fixed_v0_1 MarianTransformer from gshields +author: John Snow Labs +name: translate_model_fixed_v0_1 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translate_model_fixed_v0_1` is a English model originally trained by gshields. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translate_model_fixed_v0_1_en_5.5.0_3.0_1726503262323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translate_model_fixed_v0_1_en_5.5.0_3.0_1726503262323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("translate_model_fixed_v0_1","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("translate_model_fixed_v0_1","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translate_model_fixed_v0_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|522.8 MB| + +## References + +https://huggingface.co/gshields/translate_model_fixed_v0.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-translate_model_fixed_v0_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-translate_model_fixed_v0_1_pipeline_en.md new file mode 100644 index 00000000000000..d0501058608c3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-translate_model_fixed_v0_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English translate_model_fixed_v0_1_pipeline pipeline MarianTransformer from gshields +author: John Snow Labs +name: translate_model_fixed_v0_1_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translate_model_fixed_v0_1_pipeline` is a English model originally trained by gshields. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translate_model_fixed_v0_1_pipeline_en_5.5.0_3.0_1726503286388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translate_model_fixed_v0_1_pipeline_en_5.5.0_3.0_1726503286388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("translate_model_fixed_v0_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("translate_model_fixed_v0_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translate_model_fixed_v0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|523.4 MB| + +## References + +https://huggingface.co/gshields/translate_model_fixed_v0.1 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-translate_model_fixed_v0_3_en.md b/docs/_posts/ahmedlone127/2024-09-16-translate_model_fixed_v0_3_en.md new file mode 100644 index 00000000000000..7c173125367bcb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-translate_model_fixed_v0_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English translate_model_fixed_v0_3 MarianTransformer from gshields +author: John Snow Labs +name: translate_model_fixed_v0_3 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translate_model_fixed_v0_3` is a English model originally trained by gshields. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translate_model_fixed_v0_3_en_5.5.0_3.0_1726493766442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translate_model_fixed_v0_3_en_5.5.0_3.0_1726493766442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("translate_model_fixed_v0_3","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("translate_model_fixed_v0_3","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translate_model_fixed_v0_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|522.8 MB| + +## References + +https://huggingface.co/gshields/translate_model_fixed_v0.3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-turkish_medical_field_detection_8_en.md b/docs/_posts/ahmedlone127/2024-09-16-turkish_medical_field_detection_8_en.md new file mode 100644 index 00000000000000..1cdd4e3ce8bf18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-turkish_medical_field_detection_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English turkish_medical_field_detection_8 BertForSequenceClassification from sezinarseven +author: John Snow Labs +name: turkish_medical_field_detection_8 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`turkish_medical_field_detection_8` is a English model originally trained by sezinarseven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/turkish_medical_field_detection_8_en_5.5.0_3.0_1726499064805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/turkish_medical_field_detection_8_en_5.5.0_3.0_1726499064805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("turkish_medical_field_detection_8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("turkish_medical_field_detection_8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|turkish_medical_field_detection_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|691.8 MB| + +## References + +https://huggingface.co/sezinarseven/turkish-medical-field-detection-8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-turkish_medical_field_detection_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-turkish_medical_field_detection_8_pipeline_en.md new file mode 100644 index 00000000000000..ddff39e6d206fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-turkish_medical_field_detection_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English turkish_medical_field_detection_8_pipeline pipeline BertForSequenceClassification from sezinarseven +author: John Snow Labs +name: turkish_medical_field_detection_8_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`turkish_medical_field_detection_8_pipeline` is a English model originally trained by sezinarseven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/turkish_medical_field_detection_8_pipeline_en_5.5.0_3.0_1726499098420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/turkish_medical_field_detection_8_pipeline_en_5.5.0_3.0_1726499098420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("turkish_medical_field_detection_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("turkish_medical_field_detection_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|turkish_medical_field_detection_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|691.8 MB| + +## References + +https://huggingface.co/sezinarseven/turkish-medical-field-detection-8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-twitter_roberta_base_dec2021_tweet_topic_single_2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-twitter_roberta_base_dec2021_tweet_topic_single_2020_pipeline_en.md new file mode 100644 index 00000000000000..04fa82c9a49e96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-twitter_roberta_base_dec2021_tweet_topic_single_2020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_dec2021_tweet_topic_single_2020_pipeline pipeline RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_dec2021_tweet_topic_single_2020_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_dec2021_tweet_topic_single_2020_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_dec2021_tweet_topic_single_2020_pipeline_en_5.5.0_3.0_1726505146400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_dec2021_tweet_topic_single_2020_pipeline_en_5.5.0_3.0_1726505146400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_dec2021_tweet_topic_single_2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_dec2021_tweet_topic_single_2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_dec2021_tweet_topic_single_2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base-dec2021-tweet-topic-single-2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-twitter_roberta_base_sentiment_latest_nizar_sayad_en.md b/docs/_posts/ahmedlone127/2024-09-16-twitter_roberta_base_sentiment_latest_nizar_sayad_en.md new file mode 100644 index 00000000000000..0e771b8b64ed2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-twitter_roberta_base_sentiment_latest_nizar_sayad_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_roberta_base_sentiment_latest_nizar_sayad RoBertaForSequenceClassification from nizar-sayad +author: John Snow Labs +name: twitter_roberta_base_sentiment_latest_nizar_sayad +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_sentiment_latest_nizar_sayad` is a English model originally trained by nizar-sayad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_sentiment_latest_nizar_sayad_en_5.5.0_3.0_1726470375423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_sentiment_latest_nizar_sayad_en_5.5.0_3.0_1726470375423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_sentiment_latest_nizar_sayad","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_sentiment_latest_nizar_sayad", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_sentiment_latest_nizar_sayad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/nizar-sayad/twitter-roberta-base-sentiment-latest \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-vanilla_dermat_es.md b/docs/_posts/ahmedlone127/2024-09-16-vanilla_dermat_es.md new file mode 100644 index 00000000000000..e4ce968863655b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-vanilla_dermat_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish vanilla_dermat RoBertaForSequenceClassification from fundacionctic +author: John Snow Labs +name: vanilla_dermat +date: 2024-09-16 +tags: [es, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`vanilla_dermat` is a Castilian, Spanish model originally trained by fundacionctic. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vanilla_dermat_es_5.5.0_3.0_1726455706149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/vanilla_dermat_es_5.5.0_3.0_1726455706149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("vanilla_dermat","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("vanilla_dermat", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|vanilla_dermat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|es| +|Size:|432.3 MB| + +## References + +https://huggingface.co/fundacionctic/vanilla-dermat \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-verbnet_train_en.md b/docs/_posts/ahmedlone127/2024-09-16-verbnet_train_en.md new file mode 100644 index 00000000000000..63cf96ea1ce405 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-verbnet_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English verbnet_train RoBertaForTokenClassification from gguichard +author: John Snow Labs +name: verbnet_train +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`verbnet_train` is a English model originally trained by gguichard. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/verbnet_train_en_5.5.0_3.0_1726529961493.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/verbnet_train_en_5.5.0_3.0_1726529961493.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("verbnet_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("verbnet_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|verbnet_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/gguichard/verbnet_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-vietnamese_whisper_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-vietnamese_whisper_small_pipeline_en.md new file mode 100644 index 00000000000000..3c7a1219f25092 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-vietnamese_whisper_small_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English vietnamese_whisper_small_pipeline pipeline WhisperForCTC from DuyTa +author: John Snow Labs +name: vietnamese_whisper_small_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`vietnamese_whisper_small_pipeline` is a English model originally trained by DuyTa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vietnamese_whisper_small_pipeline_en_5.5.0_3.0_1726485979999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/vietnamese_whisper_small_pipeline_en_5.5.0_3.0_1726485979999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("vietnamese_whisper_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("vietnamese_whisper_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|vietnamese_whisper_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/DuyTa/vi_whisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_2e_4a_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_2e_4a_en.md new file mode 100644 index 00000000000000..48174592e32d37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_2e_4a_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_2e_4a WhisperForCTC from yusufagung29 +author: John Snow Labs +name: whisper_2e_4a +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_2e_4a` is a English model originally trained by yusufagung29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_2e_4a_en_5.5.0_3.0_1726488799692.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_2e_4a_en_5.5.0_3.0_1726488799692.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_2e_4a","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_2e_4a", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_2e_4a| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/yusufagung29/whisper_2e-4a \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_base_arabic_yazansalameh_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_base_arabic_yazansalameh_pipeline_ar.md new file mode 100644 index 00000000000000..4a8357f987f041 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_base_arabic_yazansalameh_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_base_arabic_yazansalameh_pipeline pipeline WhisperForCTC from YazanSalameh +author: John Snow Labs +name: whisper_base_arabic_yazansalameh_pipeline +date: 2024-09-16 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_arabic_yazansalameh_pipeline` is a Arabic model originally trained by YazanSalameh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_arabic_yazansalameh_pipeline_ar_5.5.0_3.0_1726476992468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_arabic_yazansalameh_pipeline_ar_5.5.0_3.0_1726476992468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_arabic_yazansalameh_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_arabic_yazansalameh_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_arabic_yazansalameh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|641.7 MB| + +## References + +https://huggingface.co/YazanSalameh/Whisper-base-Arabic + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_base_hindi_atulksingh_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_base_hindi_atulksingh_en.md new file mode 100644 index 00000000000000..852862399a1ae3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_base_hindi_atulksingh_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_hindi_atulksingh WhisperForCTC from atulksingh +author: John Snow Labs +name: whisper_base_hindi_atulksingh +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_hindi_atulksingh` is a English model originally trained by atulksingh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_hindi_atulksingh_en_5.5.0_3.0_1726480375596.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_hindi_atulksingh_en_5.5.0_3.0_1726480375596.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_hindi_atulksingh","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_hindi_atulksingh", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_hindi_atulksingh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.2 MB| + +## References + +https://huggingface.co/atulksingh/whisper-base-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_base_jyutping_without_tones_full_chinese_hk_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_base_jyutping_without_tones_full_chinese_hk_en.md new file mode 100644 index 00000000000000..ca930f53b41862 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_base_jyutping_without_tones_full_chinese_hk_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_jyutping_without_tones_full_chinese_hk WhisperForCTC from AlienKevin +author: John Snow Labs +name: whisper_base_jyutping_without_tones_full_chinese_hk +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_jyutping_without_tones_full_chinese_hk` is a English model originally trained by AlienKevin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_jyutping_without_tones_full_chinese_hk_en_5.5.0_3.0_1726483177398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_jyutping_without_tones_full_chinese_hk_en_5.5.0_3.0_1726483177398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_jyutping_without_tones_full_chinese_hk","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_jyutping_without_tones_full_chinese_hk", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_jyutping_without_tones_full_chinese_hk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.4 MB| + +## References + +https://huggingface.co/AlienKevin/whisper-base-jyutping-without-tones-full-zh-HK \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_medium_14000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_medium_14000_pipeline_en.md new file mode 100644 index 00000000000000..58e447f475ddec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_medium_14000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_medium_14000_pipeline pipeline WhisperForCTC from abdumalikov +author: John Snow Labs +name: whisper_medium_14000_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_14000_pipeline` is a English model originally trained by abdumalikov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_14000_pipeline_en_5.5.0_3.0_1726487877022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_14000_pipeline_en_5.5.0_3.0_1726487877022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_medium_14000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_medium_14000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_14000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/abdumalikov/whisper-medium-14000 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_divehi_jensg_dv.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_divehi_jensg_dv.md new file mode 100644 index 00000000000000..80dcac5d676ec2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_divehi_jensg_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_jensg WhisperForCTC from jensg +author: John Snow Labs +name: whisper_small_divehi_jensg +date: 2024-09-16 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_jensg` is a Dhivehi, Divehi, Maldivian model originally trained by jensg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_jensg_dv_5.5.0_3.0_1726483465577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_jensg_dv_5.5.0_3.0_1726483465577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_jensg","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_jensg", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_jensg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jensg/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_divehi_victorbarra_dv.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_divehi_victorbarra_dv.md new file mode 100644 index 00000000000000..0b38c260d01cae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_divehi_victorbarra_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_victorbarra WhisperForCTC from victorbarra +author: John Snow Labs +name: whisper_small_divehi_victorbarra +date: 2024-09-16 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_victorbarra` is a Dhivehi, Divehi, Maldivian model originally trained by victorbarra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_victorbarra_dv_5.5.0_3.0_1726478021547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_victorbarra_dv_5.5.0_3.0_1726478021547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_victorbarra","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_victorbarra", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_victorbarra| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/victorbarra/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_hindi_sr1ncvs_hi.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_hindi_sr1ncvs_hi.md new file mode 100644 index 00000000000000..4f8ecbdd1d0648 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_hindi_sr1ncvs_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_hindi_sr1ncvs WhisperForCTC from sr1ncvs +author: John Snow Labs +name: whisper_small_hindi_sr1ncvs +date: 2024-09-16 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_sr1ncvs` is a Hindi model originally trained by sr1ncvs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_sr1ncvs_hi_5.5.0_3.0_1726476317697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_sr1ncvs_hi_5.5.0_3.0_1726476317697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_sr1ncvs","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_sr1ncvs", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_sr1ncvs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|390.0 MB| + +## References + +https://huggingface.co/sr1ncvs/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_korean_finetuned_single_speaker_3922samples_ko.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_korean_finetuned_single_speaker_3922samples_ko.md new file mode 100644 index 00000000000000..785bf01966f24a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_korean_finetuned_single_speaker_3922samples_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean whisper_small_korean_finetuned_single_speaker_3922samples WhisperForCTC from daekeun-ml +author: John Snow Labs +name: whisper_small_korean_finetuned_single_speaker_3922samples +date: 2024-09-16 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_korean_finetuned_single_speaker_3922samples` is a Korean model originally trained by daekeun-ml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_korean_finetuned_single_speaker_3922samples_ko_5.5.0_3.0_1726483574778.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_korean_finetuned_single_speaker_3922samples_ko_5.5.0_3.0_1726483574778.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_korean_finetuned_single_speaker_3922samples","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_korean_finetuned_single_speaker_3922samples", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_korean_finetuned_single_speaker_3922samples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/daekeun-ml/whisper-small-ko-finetuned-single-speaker-3922samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_korean_finetuned_single_speaker_3922samples_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_korean_finetuned_single_speaker_3922samples_pipeline_ko.md new file mode 100644 index 00000000000000..81b88b27d75279 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_korean_finetuned_single_speaker_3922samples_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean whisper_small_korean_finetuned_single_speaker_3922samples_pipeline pipeline WhisperForCTC from daekeun-ml +author: John Snow Labs +name: whisper_small_korean_finetuned_single_speaker_3922samples_pipeline +date: 2024-09-16 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_korean_finetuned_single_speaker_3922samples_pipeline` is a Korean model originally trained by daekeun-ml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_korean_finetuned_single_speaker_3922samples_pipeline_ko_5.5.0_3.0_1726483659413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_korean_finetuned_single_speaker_3922samples_pipeline_ko_5.5.0_3.0_1726483659413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_korean_finetuned_single_speaker_3922samples_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_korean_finetuned_single_speaker_3922samples_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_korean_finetuned_single_speaker_3922samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/daekeun-ml/whisper-small-ko-finetuned-single-speaker-3922samples + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_oxel40_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_oxel40_en.md new file mode 100644 index 00000000000000..2474447948bdca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_oxel40_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_oxel40 WhisperForCTC from Oxel40 +author: John Snow Labs +name: whisper_small_oxel40 +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_oxel40` is a English model originally trained by Oxel40. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_oxel40_en_5.5.0_3.0_1726487598830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_oxel40_en_5.5.0_3.0_1726487598830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_oxel40","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_oxel40", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_oxel40| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Oxel40/whisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_oxel40_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_oxel40_pipeline_en.md new file mode 100644 index 00000000000000..26a73eae3d3a74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_oxel40_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_oxel40_pipeline pipeline WhisperForCTC from Oxel40 +author: John Snow Labs +name: whisper_small_oxel40_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_oxel40_pipeline` is a English model originally trained by Oxel40. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_oxel40_pipeline_en_5.5.0_3.0_1726487684470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_oxel40_pipeline_en_5.5.0_3.0_1726487684470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_oxel40_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_oxel40_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_oxel40_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Oxel40/whisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_telugu_parambharat_pipeline_te.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_telugu_parambharat_pipeline_te.md new file mode 100644 index 00000000000000..e18ee0012ff00e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_telugu_parambharat_pipeline_te.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Telugu whisper_small_telugu_parambharat_pipeline pipeline WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_small_telugu_parambharat_pipeline +date: 2024-09-16 +tags: [te, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: te +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_telugu_parambharat_pipeline` is a Telugu model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_parambharat_pipeline_te_5.5.0_3.0_1726485453568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_parambharat_pipeline_te_5.5.0_3.0_1726485453568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_telugu_parambharat_pipeline", lang = "te") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_telugu_parambharat_pipeline", lang = "te") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_telugu_parambharat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|te| +|Size:|1.7 GB| + +## References + +https://huggingface.co/parambharat/whisper-small-te + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_telugu_parambharat_te.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_telugu_parambharat_te.md new file mode 100644 index 00000000000000..a0ab83a8dfcbcc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_telugu_parambharat_te.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Telugu whisper_small_telugu_parambharat WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_small_telugu_parambharat +date: 2024-09-16 +tags: [te, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: te +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_telugu_parambharat` is a Telugu model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_parambharat_te_5.5.0_3.0_1726485368871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_parambharat_te_5.5.0_3.0_1726485368871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_telugu_parambharat","te") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_telugu_parambharat", "te") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_telugu_parambharat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|te| +|Size:|1.7 GB| + +## References + +https://huggingface.co/parambharat/whisper-small-te \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_vietnamese_chuvash_fl_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_vietnamese_chuvash_fl_v1_pipeline_en.md new file mode 100644 index 00000000000000..269e384fb559ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_vietnamese_chuvash_fl_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_vietnamese_chuvash_fl_v1_pipeline pipeline WhisperForCTC from thanhduycao +author: John Snow Labs +name: whisper_small_vietnamese_chuvash_fl_v1_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_vietnamese_chuvash_fl_v1_pipeline` is a English model originally trained by thanhduycao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_vietnamese_chuvash_fl_v1_pipeline_en_5.5.0_3.0_1726486946954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_vietnamese_chuvash_fl_v1_pipeline_en_5.5.0_3.0_1726486946954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_vietnamese_chuvash_fl_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_vietnamese_chuvash_fl_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_vietnamese_chuvash_fl_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/thanhduycao/whisper-small-vi-cv-fl-v1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_vtlustos_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_vtlustos_en.md new file mode 100644 index 00000000000000..8cb34d0d458195 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_vtlustos_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_vtlustos WhisperForCTC from vtlustos +author: John Snow Labs +name: whisper_small_vtlustos +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_vtlustos` is a English model originally trained by vtlustos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_vtlustos_en_5.5.0_3.0_1726485497173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_vtlustos_en_5.5.0_3.0_1726485497173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_vtlustos","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_vtlustos", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_vtlustos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/vtlustos/whisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_vtlustos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_vtlustos_pipeline_en.md new file mode 100644 index 00000000000000..2b9464195c6e31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_vtlustos_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_vtlustos_pipeline pipeline WhisperForCTC from vtlustos +author: John Snow Labs +name: whisper_small_vtlustos_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_vtlustos_pipeline` is a English model originally trained by vtlustos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_vtlustos_pipeline_en_5.5.0_3.0_1726485582397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_vtlustos_pipeline_en_5.5.0_3.0_1726485582397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_vtlustos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_vtlustos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_vtlustos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/vtlustos/whisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_western_frisian_dutch_transfer_from_english_polixonrio_pipeline_fy.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_western_frisian_dutch_transfer_from_english_polixonrio_pipeline_fy.md new file mode 100644 index 00000000000000..4be95a31c7e9ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_western_frisian_dutch_transfer_from_english_polixonrio_pipeline_fy.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Western Frisian whisper_small_western_frisian_dutch_transfer_from_english_polixonrio_pipeline pipeline WhisperForCTC from polixonrio +author: John Snow Labs +name: whisper_small_western_frisian_dutch_transfer_from_english_polixonrio_pipeline +date: 2024-09-16 +tags: [fy, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: fy +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_western_frisian_dutch_transfer_from_english_polixonrio_pipeline` is a Western Frisian model originally trained by polixonrio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_western_frisian_dutch_transfer_from_english_polixonrio_pipeline_fy_5.5.0_3.0_1726485199075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_western_frisian_dutch_transfer_from_english_polixonrio_pipeline_fy_5.5.0_3.0_1726485199075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_western_frisian_dutch_transfer_from_english_polixonrio_pipeline", lang = "fy") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_western_frisian_dutch_transfer_from_english_polixonrio_pipeline", lang = "fy") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_western_frisian_dutch_transfer_from_english_polixonrio_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fy| +|Size:|1.7 GB| + +## References + +https://huggingface.co/polixonrio/whisper-small-fy-NL-Transfer-From-EN + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_sv3_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_sv3_pipeline_sv.md new file mode 100644 index 00000000000000..b9fc5389280906 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_sv3_pipeline_sv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Swedish whisper_sv3_pipeline pipeline WhisperForCTC from SebLih +author: John Snow Labs +name: whisper_sv3_pipeline +date: 2024-09-16 +tags: [sv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_sv3_pipeline` is a Swedish model originally trained by SebLih. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_sv3_pipeline_sv_5.5.0_3.0_1726484599370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_sv3_pipeline_sv_5.5.0_3.0_1726484599370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_sv3_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_sv3_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_sv3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/SebLih/whisper-SV3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_sv3_sv.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_sv3_sv.md new file mode 100644 index 00000000000000..eed1d2d3581152 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_sv3_sv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Swedish whisper_sv3 WhisperForCTC from SebLih +author: John Snow Labs +name: whisper_sv3 +date: 2024-09-16 +tags: [sv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_sv3` is a Swedish model originally trained by SebLih. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_sv3_sv_5.5.0_3.0_1726484488982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_sv3_sv_5.5.0_3.0_1726484488982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_sv3","sv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_sv3", "sv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_sv3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/SebLih/whisper-SV3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_cv17_hungarian_pipeline_hu.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_cv17_hungarian_pipeline_hu.md new file mode 100644 index 00000000000000..2cca527a9998fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_cv17_hungarian_pipeline_hu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hungarian whisper_tiny_cv17_hungarian_pipeline pipeline WhisperForCTC from Hungarians +author: John Snow Labs +name: whisper_tiny_cv17_hungarian_pipeline +date: 2024-09-16 +tags: [hu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_cv17_hungarian_pipeline` is a Hungarian model originally trained by Hungarians. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_cv17_hungarian_pipeline_hu_5.5.0_3.0_1726488104942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_cv17_hungarian_pipeline_hu_5.5.0_3.0_1726488104942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_cv17_hungarian_pipeline", lang = "hu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_cv17_hungarian_pipeline", lang = "hu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_cv17_hungarian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hu| +|Size:|389.1 MB| + +## References + +https://huggingface.co/Hungarians/whisper-tiny-cv17-hu + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_czech_kozak_vaclav_pipeline_cs.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_czech_kozak_vaclav_pipeline_cs.md new file mode 100644 index 00000000000000..873f8f22828920 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_czech_kozak_vaclav_pipeline_cs.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Czech whisper_tiny_czech_kozak_vaclav_pipeline pipeline WhisperForCTC from kozak-vaclav +author: John Snow Labs +name: whisper_tiny_czech_kozak_vaclav_pipeline +date: 2024-09-16 +tags: [cs, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: cs +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_czech_kozak_vaclav_pipeline` is a Czech model originally trained by kozak-vaclav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_czech_kozak_vaclav_pipeline_cs_5.5.0_3.0_1726483170530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_czech_kozak_vaclav_pipeline_cs_5.5.0_3.0_1726483170530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_czech_kozak_vaclav_pipeline", lang = "cs") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_czech_kozak_vaclav_pipeline", lang = "cs") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_czech_kozak_vaclav_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|cs| +|Size:|389.8 MB| + +## References + +https://huggingface.co/kozak-vaclav/whisper-tiny-cs + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_kollis_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_kollis_en.md new file mode 100644 index 00000000000000..5edfca4178a788 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_kollis_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_kollis WhisperForCTC from kollis +author: John Snow Labs +name: whisper_tiny_english_kollis +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_kollis` is a English model originally trained by kollis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_kollis_en_5.5.0_3.0_1726477969183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_kollis_en_5.5.0_3.0_1726477969183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_kollis","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_kollis", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_kollis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/kollis/whisper-tiny-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_hamzasidhu786_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_hamzasidhu786_en.md new file mode 100644 index 00000000000000..97298f58374e82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_hamzasidhu786_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_us_hamzasidhu786 WhisperForCTC from HamzaSidhu786 +author: John Snow Labs +name: whisper_tiny_english_us_hamzasidhu786 +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_us_hamzasidhu786` is a English model originally trained by HamzaSidhu786. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_hamzasidhu786_en_5.5.0_3.0_1726476373646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_hamzasidhu786_en_5.5.0_3.0_1726476373646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_us_hamzasidhu786","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_us_hamzasidhu786", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_us_hamzasidhu786| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|242.7 MB| + +## References + +https://huggingface.co/HamzaSidhu786/whisper-tiny-en_US \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_hamzasidhu786_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_hamzasidhu786_pipeline_en.md new file mode 100644 index 00000000000000..cba5fff06624fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_hamzasidhu786_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_english_us_hamzasidhu786_pipeline pipeline WhisperForCTC from HamzaSidhu786 +author: John Snow Labs +name: whisper_tiny_english_us_hamzasidhu786_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_us_hamzasidhu786_pipeline` is a English model originally trained by HamzaSidhu786. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_hamzasidhu786_pipeline_en_5.5.0_3.0_1726476443832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_hamzasidhu786_pipeline_en_5.5.0_3.0_1726476443832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_english_us_hamzasidhu786_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_english_us_hamzasidhu786_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_us_hamzasidhu786_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|242.7 MB| + +## References + +https://huggingface.co/HamzaSidhu786/whisper-tiny-en_US + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_sfedar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_sfedar_pipeline_en.md new file mode 100644 index 00000000000000..b393dece80bfaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_sfedar_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_english_us_sfedar_pipeline pipeline WhisperForCTC from sfedar +author: John Snow Labs +name: whisper_tiny_english_us_sfedar_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_us_sfedar_pipeline` is a English model originally trained by sfedar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_sfedar_pipeline_en_5.5.0_3.0_1726478977758.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_sfedar_pipeline_en_5.5.0_3.0_1726478977758.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_english_us_sfedar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_english_us_sfedar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_us_sfedar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/sfedar/whisper-tiny-en-US + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_finetuned_minds14_lightmourne_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_finetuned_minds14_lightmourne_pipeline_en.md new file mode 100644 index 00000000000000..bec9927718a44f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_finetuned_minds14_lightmourne_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_finetuned_minds14_lightmourne_pipeline pipeline WhisperForCTC from Lightmourne +author: John Snow Labs +name: whisper_tiny_finetuned_minds14_lightmourne_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_finetuned_minds14_lightmourne_pipeline` is a English model originally trained by Lightmourne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_lightmourne_pipeline_en_5.5.0_3.0_1726486770975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_lightmourne_pipeline_en_5.5.0_3.0_1726486770975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_finetuned_minds14_lightmourne_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_finetuned_minds14_lightmourne_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_finetuned_minds14_lightmourne_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/Lightmourne/whisper-tiny-finetuned-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_polyai_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_polyai_en.md new file mode 100644 index 00000000000000..f81e4603b21d51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_polyai_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_polyai WhisperForCTC from giocs2017 +author: John Snow Labs +name: whisper_tiny_polyai +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_polyai` is a English model originally trained by giocs2017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_polyai_en_5.5.0_3.0_1726485501763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_polyai_en_5.5.0_3.0_1726485501763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_polyai","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_polyai", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_polyai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/giocs2017/whisper-tiny-polyai \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_thai_pattara12345_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_thai_pattara12345_en.md new file mode 100644 index 00000000000000..eeaeefab67b648 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_thai_pattara12345_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_thai_pattara12345 WhisperForCTC from pattara12345 +author: John Snow Labs +name: whisper_tiny_thai_pattara12345 +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_thai_pattara12345` is a English model originally trained by pattara12345. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_thai_pattara12345_en_5.5.0_3.0_1726478775141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_thai_pattara12345_en_5.5.0_3.0_1726478775141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_thai_pattara12345","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_thai_pattara12345", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_thai_pattara12345| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/pattara12345/whisper-tiny-th \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_v4_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_v4_small_pipeline_en.md new file mode 100644 index 00000000000000..6cb84327d77d44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_v4_small_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_v4_small_pipeline pipeline WhisperForCTC from karinthommen +author: John Snow Labs +name: whisper_v4_small_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_v4_small_pipeline` is a English model originally trained by karinthommen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_v4_small_pipeline_en_5.5.0_3.0_1726488277983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_v4_small_pipeline_en_5.5.0_3.0_1726488277983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_v4_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_v4_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_v4_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/karinthommen/whisper-V4-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_wolof_wo.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_wolof_wo.md new file mode 100644 index 00000000000000..dc4667e5079199 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_wolof_wo.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Wolof whisper_wolof WhisperForCTC from cibfaye +author: John Snow Labs +name: whisper_wolof +date: 2024-09-16 +tags: [wo, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: wo +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_wolof` is a Wolof model originally trained by cibfaye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_wolof_wo_5.5.0_3.0_1726479232712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_wolof_wo_5.5.0_3.0_1726479232712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_wolof","wo") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_wolof", "wo") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_wolof| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|wo| +|Size:|1.7 GB| + +## References + +https://huggingface.co/cibfaye/whisper-wolof \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-withinapps_ndd_addressbook_test_content_tags_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-withinapps_ndd_addressbook_test_content_tags_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..cc8727ba32a30d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-withinapps_ndd_addressbook_test_content_tags_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_addressbook_test_content_tags_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_addressbook_test_content_tags_cwadj_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_addressbook_test_content_tags_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_addressbook_test_content_tags_cwadj_pipeline_en_5.5.0_3.0_1726525657462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_addressbook_test_content_tags_cwadj_pipeline_en_5.5.0_3.0_1726525657462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_addressbook_test_content_tags_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_addressbook_test_content_tags_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_addressbook_test_content_tags_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-addressbook_test-content_tags-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_language_detection_brenomatos_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_language_detection_brenomatos_en.md new file mode 100644 index 00000000000000..99f9030bc8b042 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_language_detection_brenomatos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_language_detection_brenomatos BertForSequenceClassification from brenomatos +author: John Snow Labs +name: xlm_roberta_base_finetuned_language_detection_brenomatos +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_language_detection_brenomatos` is a English model originally trained by brenomatos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_language_detection_brenomatos_en_5.5.0_3.0_1726459800736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_language_detection_brenomatos_en_5.5.0_3.0_1726459800736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("xlm_roberta_base_finetuned_language_detection_brenomatos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("xlm_roberta_base_finetuned_language_detection_brenomatos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_language_detection_brenomatos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/brenomatos/xlm-roberta-base-finetuned-language-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_language_detection_brenomatos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_language_detection_brenomatos_pipeline_en.md new file mode 100644 index 00000000000000..c159cddd451f4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_language_detection_brenomatos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_language_detection_brenomatos_pipeline pipeline BertForSequenceClassification from brenomatos +author: John Snow Labs +name: xlm_roberta_base_finetuned_language_detection_brenomatos_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_language_detection_brenomatos_pipeline` is a English model originally trained by brenomatos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_language_detection_brenomatos_pipeline_en_5.5.0_3.0_1726459831161.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_language_detection_brenomatos_pipeline_en_5.5.0_3.0_1726459831161.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_language_detection_brenomatos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_language_detection_brenomatos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_language_detection_brenomatos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/brenomatos/xlm-roberta-base-finetuned-language-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_jhagege_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_jhagege_pipeline_en.md new file mode 100644 index 00000000000000..d73e6ade5f77dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_jhagege_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_jhagege_pipeline pipeline XlmRoBertaForTokenClassification from jhagege +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_jhagege_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_jhagege_pipeline` is a English model originally trained by jhagege. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jhagege_pipeline_en_5.5.0_3.0_1726495950602.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jhagege_pipeline_en_5.5.0_3.0_1726495950602.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_jhagege_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_jhagege_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_jhagege_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|859.8 MB| + +## References + +https://huggingface.co/jhagege/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_laurentiustancioiu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_laurentiustancioiu_pipeline_en.md new file mode 100644 index 00000000000000..9b57d53e58edb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_laurentiustancioiu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_laurentiustancioiu_pipeline pipeline XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_laurentiustancioiu_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_laurentiustancioiu_pipeline` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_laurentiustancioiu_pipeline_en_5.5.0_3.0_1726497570367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_laurentiustancioiu_pipeline_en_5.5.0_3.0_1726497570367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_laurentiustancioiu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_laurentiustancioiu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_laurentiustancioiu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_tyayoi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_tyayoi_pipeline_en.md new file mode 100644 index 00000000000000..3e9b03421090f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_tyayoi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_tyayoi_pipeline pipeline XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_tyayoi_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_tyayoi_pipeline` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_tyayoi_pipeline_en_5.5.0_3.0_1726496478951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_tyayoi_pipeline_en_5.5.0_3.0_1726496478951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_tyayoi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_tyayoi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_tyayoi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_bengali_deepaperi_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_bengali_deepaperi_en.md new file mode 100644 index 00000000000000..66aaafb20ccb82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_bengali_deepaperi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_bengali_deepaperi XlmRoBertaForTokenClassification from DeepaPeri +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_bengali_deepaperi +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_bengali_deepaperi` is a English model originally trained by DeepaPeri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_bengali_deepaperi_en_5.5.0_3.0_1726495374345.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_bengali_deepaperi_en_5.5.0_3.0_1726495374345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_bengali_deepaperi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_bengali_deepaperi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_bengali_deepaperi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.5 MB| + +## References + +https://huggingface.co/DeepaPeri/xlm-roberta-base-finetuned-panx-bn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_alkampfer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_alkampfer_pipeline_en.md new file mode 100644 index 00000000000000..2382c358ee59f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_alkampfer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_alkampfer_pipeline pipeline XlmRoBertaForTokenClassification from alkampfer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_alkampfer_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_alkampfer_pipeline` is a English model originally trained by alkampfer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_alkampfer_pipeline_en_5.5.0_3.0_1726496408885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_alkampfer_pipeline_en_5.5.0_3.0_1726496408885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_alkampfer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_alkampfer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_alkampfer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/alkampfer/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_param_mehta_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_param_mehta_en.md new file mode 100644 index 00000000000000..b2bfbd8ef1a2bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_param_mehta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_param_mehta XlmRoBertaForTokenClassification from param-mehta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_param_mehta +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_param_mehta` is a English model originally trained by param-mehta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_param_mehta_en_5.5.0_3.0_1726495447469.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_param_mehta_en_5.5.0_3.0_1726495447469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_param_mehta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_param_mehta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_param_mehta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|817.2 MB| + +## References + +https://huggingface.co/param-mehta/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_french_aiekek_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_french_aiekek_en.md new file mode 100644 index 00000000000000..95418ebe148d85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_french_aiekek_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_aiekek XlmRoBertaForTokenClassification from AIEKEK +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_aiekek +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_aiekek` is a English model originally trained by AIEKEK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_aiekek_en_5.5.0_3.0_1726497427137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_aiekek_en_5.5.0_3.0_1726497427137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_aiekek","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_aiekek", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_aiekek| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/AIEKEK/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_french_aiekek_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_french_aiekek_pipeline_en.md new file mode 100644 index 00000000000000..12c211d4eb1c67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_french_aiekek_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_aiekek_pipeline pipeline XlmRoBertaForTokenClassification from AIEKEK +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_aiekek_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_aiekek_pipeline` is a English model originally trained by AIEKEK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_aiekek_pipeline_en_5.5.0_3.0_1726497515459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_aiekek_pipeline_en_5.5.0_3.0_1726497515459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_aiekek_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_aiekek_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_aiekek_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/AIEKEK/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_brouwer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_brouwer_pipeline_en.md new file mode 100644 index 00000000000000..ce9bfb3f6211e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_brouwer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_brouwer_pipeline pipeline XlmRoBertaForTokenClassification from brouwer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_brouwer_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_brouwer_pipeline` is a English model originally trained by brouwer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_brouwer_pipeline_en_5.5.0_3.0_1726495979443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_brouwer_pipeline_en_5.5.0_3.0_1726495979443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_brouwer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_brouwer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_brouwer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|574.7 MB| + +## References + +https://huggingface.co/brouwer/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_francois2511_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_francois2511_pipeline_en.md new file mode 100644 index 00000000000000..697eb1b2dd4e10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_francois2511_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_francois2511_pipeline pipeline XlmRoBertaForTokenClassification from Francois2511 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_francois2511_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_francois2511_pipeline` is a English model originally trained by Francois2511. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_francois2511_pipeline_en_5.5.0_3.0_1726497080407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_francois2511_pipeline_en_5.5.0_3.0_1726497080407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_francois2511_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_francois2511_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_francois2511_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Francois2511/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_jx7789_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_jx7789_pipeline_en.md new file mode 100644 index 00000000000000..2d880541f96116 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_jx7789_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_jx7789_pipeline pipeline XlmRoBertaForTokenClassification from jx7789 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_jx7789_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_jx7789_pipeline` is a English model originally trained by jx7789. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jx7789_pipeline_en_5.5.0_3.0_1726495124187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jx7789_pipeline_en_5.5.0_3.0_1726495124187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_jx7789_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_jx7789_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_jx7789_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jx7789/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_msrisrujan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_msrisrujan_pipeline_en.md new file mode 100644 index 00000000000000..6fdfca71b39efa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_msrisrujan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_msrisrujan_pipeline pipeline XlmRoBertaForTokenClassification from Msrisrujan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_msrisrujan_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_msrisrujan_pipeline` is a English model originally trained by Msrisrujan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_msrisrujan_pipeline_en_5.5.0_3.0_1726497726445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_msrisrujan_pipeline_en_5.5.0_3.0_1726497726445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_msrisrujan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_msrisrujan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_msrisrujan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Msrisrujan/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_yoyoyo1118_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_yoyoyo1118_pipeline_en.md new file mode 100644 index 00000000000000..356cc6121d565e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_yoyoyo1118_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_yoyoyo1118_pipeline pipeline XlmRoBertaForTokenClassification from yoyoyo1118 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_yoyoyo1118_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_yoyoyo1118_pipeline` is a English model originally trained by yoyoyo1118. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_yoyoyo1118_pipeline_en_5.5.0_3.0_1726495261905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_yoyoyo1118_pipeline_en_5.5.0_3.0_1726495261905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_yoyoyo1118_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_yoyoyo1118_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_yoyoyo1118_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/yoyoyo1118/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_zdepablo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_zdepablo_pipeline_en.md new file mode 100644 index 00000000000000..6623e644c8a2d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_zdepablo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_zdepablo_pipeline pipeline XlmRoBertaForTokenClassification from zdepablo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_zdepablo_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_zdepablo_pipeline` is a English model originally trained by zdepablo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zdepablo_pipeline_en_5.5.0_3.0_1726497355141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zdepablo_pipeline_en_5.5.0_3.0_1726497355141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_zdepablo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_zdepablo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_zdepablo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/zdepablo/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_fyl1_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_fyl1_en.md new file mode 100644 index 00000000000000..f874b8279b1442 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_fyl1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_fyl1 XlmRoBertaForTokenClassification from fyl1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_fyl1 +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_fyl1` is a English model originally trained by fyl1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_fyl1_en_5.5.0_3.0_1726496180232.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_fyl1_en_5.5.0_3.0_1726496180232.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_fyl1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_fyl1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_fyl1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/fyl1/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_nadle_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_nadle_en.md new file mode 100644 index 00000000000000..508911df5c0850 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_nadle_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_nadle XlmRoBertaForTokenClassification from nadle +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_nadle +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_nadle` is a English model originally trained by nadle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nadle_en_5.5.0_3.0_1726495248781.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nadle_en_5.5.0_3.0_1726495248781.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_nadle","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_nadle", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_nadle| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.9 MB| + +## References + +https://huggingface.co/nadle/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_jhagege_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_jhagege_en.md new file mode 100644 index 00000000000000..bf24ecc7ea2422 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_jhagege_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_jhagege XlmRoBertaForTokenClassification from jhagege +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_jhagege +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_jhagege` is a English model originally trained by jhagege. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_jhagege_en_5.5.0_3.0_1726495121592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_jhagege_en_5.5.0_3.0_1726495121592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_jhagege","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_jhagege", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_jhagege| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/jhagege/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_jhagege_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_jhagege_pipeline_en.md new file mode 100644 index 00000000000000..625736b0a85aca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_jhagege_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_jhagege_pipeline pipeline XlmRoBertaForTokenClassification from jhagege +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_jhagege_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_jhagege_pipeline` is a English model originally trained by jhagege. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_jhagege_pipeline_en_5.5.0_3.0_1726495219849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_jhagege_pipeline_en_5.5.0_3.0_1726495219849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_jhagege_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_jhagege_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_jhagege_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/jhagege/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_paww_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_paww_en.md new file mode 100644 index 00000000000000..f7a281fb6cf405 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_paww_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_paww XlmRoBertaForTokenClassification from paww +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_paww +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_paww` is a English model originally trained by paww. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_paww_en_5.5.0_3.0_1726496997315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_paww_en_5.5.0_3.0_1726496997315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_paww","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_paww", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_paww| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/paww/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_pockypocky_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_pockypocky_en.md new file mode 100644 index 00000000000000..c1b0ba8c2a1b3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_pockypocky_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_pockypocky XlmRoBertaForTokenClassification from pockypocky +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_pockypocky +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_pockypocky` is a English model originally trained by pockypocky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_pockypocky_en_5.5.0_3.0_1726495759046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_pockypocky_en_5.5.0_3.0_1726495759046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_pockypocky","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_pockypocky", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_pockypocky| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/pockypocky/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_sreek_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_sreek_en.md new file mode 100644 index 00000000000000..a09593d4dfd17f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_sreek_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_sreek XlmRoBertaForTokenClassification from Sreek +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_sreek +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_sreek` is a English model originally trained by Sreek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_sreek_en_5.5.0_3.0_1726495555552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_sreek_en_5.5.0_3.0_1726495555552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_sreek","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_sreek", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_sreek| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Sreek/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_sreek_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_sreek_pipeline_en.md new file mode 100644 index 00000000000000..8c0df4a5e02c49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_italian_sreek_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_sreek_pipeline pipeline XlmRoBertaForTokenClassification from Sreek +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_sreek_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_sreek_pipeline` is a English model originally trained by Sreek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_sreek_pipeline_en_5.5.0_3.0_1726495641768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_sreek_pipeline_en_5.5.0_3.0_1726495641768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_sreek_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_sreek_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_sreek_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/Sreek/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_en.md new file mode 100644 index 00000000000000..333d802c9988f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_en_5.5.0_3.0_1726516151998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_en_5.5.0_3.0_1726516151998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|817.4 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.0001_seed42_kin-hau-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..a474e8314d0e3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_pipeline_en_5.5.0_3.0_1726516277248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_pipeline_en_5.5.0_3.0_1726516277248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_0001_seed42_kinyarwanda_hau_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|817.4 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.0001_seed42_kin-hau-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_conll2003_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_conll2003_en.md new file mode 100644 index 00000000000000..44e789b11cf093 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_conll2003_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_conll2003 XlmRoBertaForTokenClassification from manirai91 +author: John Snow Labs +name: xlm_roberta_conll2003 +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_conll2003` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_conll2003_en_5.5.0_3.0_1726495880915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_conll2003_en_5.5.0_3.0_1726495880915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_conll2003","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_conll2003", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_conll2003| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.6 MB| + +## References + +https://huggingface.co/manirai91/xlm-roberta-conll2003 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_pipeline_en.md new file mode 100644 index 00000000000000..aff33bb9f7d9f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_pipeline pipeline XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_pipeline` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_pipeline_en_5.5.0_3.0_1726517166511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_pipeline_en_5.5.0_3.0_1726517166511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-1-client-toxic-Krum-non-IID-Fed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-yt_special_batch8_tiny_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-yt_special_batch8_tiny_pipeline_en.md new file mode 100644 index 00000000000000..16b7ce59758e58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-yt_special_batch8_tiny_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English yt_special_batch8_tiny_pipeline pipeline WhisperForCTC from TheRains +author: John Snow Labs +name: yt_special_batch8_tiny_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yt_special_batch8_tiny_pipeline` is a English model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yt_special_batch8_tiny_pipeline_en_5.5.0_3.0_1726483710970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yt_special_batch8_tiny_pipeline_en_5.5.0_3.0_1726483710970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("yt_special_batch8_tiny_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("yt_special_batch8_tiny_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yt_special_batch8_tiny_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.8 MB| + +## References + +https://huggingface.co/TheRains/yt-special-batch8-tiny + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-zhask_70_15_15_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-zhask_70_15_15_pipeline_en.md new file mode 100644 index 00000000000000..2bd389e775ded9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-zhask_70_15_15_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English zhask_70_15_15_pipeline pipeline RoBertaForSequenceClassification from tegaranggana +author: John Snow Labs +name: zhask_70_15_15_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`zhask_70_15_15_pipeline` is a English model originally trained by tegaranggana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/zhask_70_15_15_pipeline_en_5.5.0_3.0_1726527299645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/zhask_70_15_15_pipeline_en_5.5.0_3.0_1726527299645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("zhask_70_15_15_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("zhask_70_15_15_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|zhask_70_15_15_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/tegaranggana/zhask_70_15_15 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-affilgood_ner_test_v5_en.md b/docs/_posts/ahmedlone127/2024-09-17-affilgood_ner_test_v5_en.md new file mode 100644 index 00000000000000..12ba3723b8da25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-affilgood_ner_test_v5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English affilgood_ner_test_v5 RoBertaForTokenClassification from nicolauduran45 +author: John Snow Labs +name: affilgood_ner_test_v5 +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`affilgood_ner_test_v5` is a English model originally trained by nicolauduran45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/affilgood_ner_test_v5_en_5.5.0_3.0_1726537494157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/affilgood_ner_test_v5_en_5.5.0_3.0_1726537494157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("affilgood_ner_test_v5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("affilgood_ner_test_v5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|affilgood_ner_test_v5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/nicolauduran45/affilgood-ner-test-v5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-affilgood_ner_test_v5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-affilgood_ner_test_v5_pipeline_en.md new file mode 100644 index 00000000000000..0f8777fefbd0ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-affilgood_ner_test_v5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English affilgood_ner_test_v5_pipeline pipeline RoBertaForTokenClassification from nicolauduran45 +author: John Snow Labs +name: affilgood_ner_test_v5_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`affilgood_ner_test_v5_pipeline` is a English model originally trained by nicolauduran45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/affilgood_ner_test_v5_pipeline_en_5.5.0_3.0_1726537519126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/affilgood_ner_test_v5_pipeline_en_5.5.0_3.0_1726537519126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("affilgood_ner_test_v5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("affilgood_ner_test_v5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|affilgood_ner_test_v5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/nicolauduran45/affilgood-ner-test-v5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-autotrain_qaindobert_44800112738_en.md b/docs/_posts/ahmedlone127/2024-09-17-autotrain_qaindobert_44800112738_en.md new file mode 100644 index 00000000000000..8302b665c48748 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-autotrain_qaindobert_44800112738_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English autotrain_qaindobert_44800112738 BertForQuestionAnswering from reyhanAfri +author: John Snow Labs +name: autotrain_qaindobert_44800112738 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_qaindobert_44800112738` is a English model originally trained by reyhanAfri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_qaindobert_44800112738_en_5.5.0_3.0_1726532135521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_qaindobert_44800112738_en_5.5.0_3.0_1726532135521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("autotrain_qaindobert_44800112738","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("autotrain_qaindobert_44800112738", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_qaindobert_44800112738| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|41.9 MB| + +## References + +https://huggingface.co/reyhanAfri/autotrain-qaindobert-44800112738 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-autotrain_qaindobert_44800112738_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-autotrain_qaindobert_44800112738_pipeline_en.md new file mode 100644 index 00000000000000..a03a368e3061dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-autotrain_qaindobert_44800112738_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English autotrain_qaindobert_44800112738_pipeline pipeline BertForQuestionAnswering from reyhanAfri +author: John Snow Labs +name: autotrain_qaindobert_44800112738_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_qaindobert_44800112738_pipeline` is a English model originally trained by reyhanAfri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_qaindobert_44800112738_pipeline_en_5.5.0_3.0_1726532137901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_qaindobert_44800112738_pipeline_en_5.5.0_3.0_1726532137901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_qaindobert_44800112738_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_qaindobert_44800112738_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_qaindobert_44800112738_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.9 MB| + +## References + +https://huggingface.co/reyhanAfri/autotrain-qaindobert-44800112738 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bangla_asr_v7_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-17-bangla_asr_v7_pipeline_bn.md new file mode 100644 index 00000000000000..72588610921d14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bangla_asr_v7_pipeline_bn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bengali bangla_asr_v7_pipeline pipeline WhisperForCTC from arif11 +author: John Snow Labs +name: bangla_asr_v7_pipeline +date: 2024-09-17 +tags: [bn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bangla_asr_v7_pipeline` is a Bengali model originally trained by arif11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bangla_asr_v7_pipeline_bn_5.5.0_3.0_1726540070642.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bangla_asr_v7_pipeline_bn_5.5.0_3.0_1726540070642.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bangla_asr_v7_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bangla_asr_v7_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bangla_asr_v7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/arif11/bangla-ASR-v7 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-base_english_combined_v4_2_0_1_8_1e_05_dulcet_sweep_34_en.md b/docs/_posts/ahmedlone127/2024-09-17-base_english_combined_v4_2_0_1_8_1e_05_dulcet_sweep_34_en.md new file mode 100644 index 00000000000000..0e36264ebc3701 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-base_english_combined_v4_2_0_1_8_1e_05_dulcet_sweep_34_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English base_english_combined_v4_2_0_1_8_1e_05_dulcet_sweep_34 WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v4_2_0_1_8_1e_05_dulcet_sweep_34 +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v4_2_0_1_8_1e_05_dulcet_sweep_34` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_2_0_1_8_1e_05_dulcet_sweep_34_en_5.5.0_3.0_1726549255277.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_2_0_1_8_1e_05_dulcet_sweep_34_en_5.5.0_3.0_1726549255277.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("base_english_combined_v4_2_0_1_8_1e_05_dulcet_sweep_34","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("base_english_combined_v4_2_0_1_8_1e_05_dulcet_sweep_34", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v4_2_0_1_8_1e_05_dulcet_sweep_34| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.6 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v4-2-0.1-8-1e-05-dulcet-sweep-34 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_german_cased_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_german_cased_finetuned_squad_en.md new file mode 100644 index 00000000000000..ddcef9c22717fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_german_cased_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_german_cased_finetuned_squad BertForQuestionAnswering from princetyagi +author: John Snow Labs +name: bert_base_german_cased_finetuned_squad +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_german_cased_finetuned_squad` is a English model originally trained by princetyagi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_finetuned_squad_en_5.5.0_3.0_1726554595026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_finetuned_squad_en_5.5.0_3.0_1726554595026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_german_cased_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_german_cased_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_german_cased_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/princetyagi/bert-base-german-cased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_en.md new file mode 100644 index 00000000000000..a8591bbcc9db06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_en_5.5.0_3.0_1726545179991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_en_5.5.0_3.0_1726545179991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.220240904182329 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_en.md new file mode 100644 index 00000000000000..7a1cb977d32ac8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_en_5.5.0_3.0_1726532288997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_en_5.5.0_3.0_1726532288997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.12-b-32-lr-8e-07-dp-0.5-ss-0-st-False-fh-False-hs-200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_pipeline_en.md new file mode 100644 index 00000000000000..b923cfc8fecd3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_pipeline_en_5.5.0_3.0_1726532309863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_pipeline_en_5.5.0_3.0_1726532309863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_12_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.12-b-32-lr-8e-07-dp-0.5-ss-0-st-False-fh-False-hs-200 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-breeze_dsw_tiny_indonesian_id.md b/docs/_posts/ahmedlone127/2024-09-17-breeze_dsw_tiny_indonesian_id.md new file mode 100644 index 00000000000000..829b07d5107a94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-breeze_dsw_tiny_indonesian_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian breeze_dsw_tiny_indonesian WhisperForCTC from hanasim +author: John Snow Labs +name: breeze_dsw_tiny_indonesian +date: 2024-09-17 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`breeze_dsw_tiny_indonesian` is a Indonesian model originally trained by hanasim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/breeze_dsw_tiny_indonesian_id_5.5.0_3.0_1726552102791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/breeze_dsw_tiny_indonesian_id_5.5.0_3.0_1726552102791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("breeze_dsw_tiny_indonesian","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("breeze_dsw_tiny_indonesian", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|breeze_dsw_tiny_indonesian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|242.6 MB| + +## References + +https://huggingface.co/hanasim/breeze-dsw-tiny-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-breeze_dsw_tiny_indonesian_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-17-breeze_dsw_tiny_indonesian_pipeline_id.md new file mode 100644 index 00000000000000..ae9259943d047d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-breeze_dsw_tiny_indonesian_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian breeze_dsw_tiny_indonesian_pipeline pipeline WhisperForCTC from hanasim +author: John Snow Labs +name: breeze_dsw_tiny_indonesian_pipeline +date: 2024-09-17 +tags: [id, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`breeze_dsw_tiny_indonesian_pipeline` is a Indonesian model originally trained by hanasim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/breeze_dsw_tiny_indonesian_pipeline_id_5.5.0_3.0_1726552171316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/breeze_dsw_tiny_indonesian_pipeline_id_5.5.0_3.0_1726552171316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("breeze_dsw_tiny_indonesian_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("breeze_dsw_tiny_indonesian_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|breeze_dsw_tiny_indonesian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|242.6 MB| + +## References + +https://huggingface.co/hanasim/breeze-dsw-tiny-id + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_drugtemist_es.md b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_drugtemist_es.md new file mode 100644 index 00000000000000..6e98c857e3c41a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_drugtemist_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_drugtemist RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_drugtemist +date: 2024-09-17 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_drugtemist` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_drugtemist_es_5.5.0_3.0_1726537827029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_drugtemist_es_5.5.0_3.0_1726537827029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_drugtemist","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_drugtemist", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_drugtemist| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|441.8 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-drugtemist \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_drugtemist_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_drugtemist_pipeline_es.md new file mode 100644 index 00000000000000..6325ea90fa096e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_drugtemist_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_drugtemist_pipeline pipeline RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_drugtemist_pipeline +date: 2024-09-17 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_drugtemist_pipeline` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_drugtemist_pipeline_es_5.5.0_3.0_1726537850827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_drugtemist_pipeline_es_5.5.0_3.0_1726537850827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_drugtemist_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_drugtemist_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_drugtemist_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|441.8 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-drugtemist + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_lash_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_lash_en.md new file mode 100644 index 00000000000000..fc23cc0323bf70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_lash_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_lash DistilBertForQuestionAnswering from lash +author: John Snow Labs +name: burmese_awesome_qa_model_lash +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_lash` is a English model originally trained by lash. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_lash_en_5.5.0_3.0_1726555465308.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_lash_en_5.5.0_3.0_1726555465308.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_lash","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_lash", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_lash| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/lash/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_lash_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_lash_pipeline_en.md new file mode 100644 index 00000000000000..d469b1617be93f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_lash_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_lash_pipeline pipeline DistilBertForQuestionAnswering from lash +author: John Snow Labs +name: burmese_awesome_qa_model_lash_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_lash_pipeline` is a English model originally trained by lash. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_lash_pipeline_en_5.5.0_3.0_1726555477794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_lash_pipeline_en_5.5.0_3.0_1726555477794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_lash_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_lash_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_lash_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/lash/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_nada_ghazouani_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_nada_ghazouani_pipeline_en.md new file mode 100644 index 00000000000000..667f5de9884d8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_nada_ghazouani_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_nada_ghazouani_pipeline pipeline DistilBertForQuestionAnswering from nada-ghazouani +author: John Snow Labs +name: burmese_awesome_qa_model_nada_ghazouani_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_nada_ghazouani_pipeline` is a English model originally trained by nada-ghazouani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_nada_ghazouani_pipeline_en_5.5.0_3.0_1726555229384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_nada_ghazouani_pipeline_en_5.5.0_3.0_1726555229384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_nada_ghazouani_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_nada_ghazouani_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_nada_ghazouani_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/nada-ghazouani/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_qa_model_yadah_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_qa_model_yadah_pipeline_en.md new file mode 100644 index 00000000000000..b83b8e4509c2da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_qa_model_yadah_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_qa_model_yadah_pipeline pipeline DistilBertForQuestionAnswering from Yadah +author: John Snow Labs +name: burmese_qa_model_yadah_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_qa_model_yadah_pipeline` is a English model originally trained by Yadah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_qa_model_yadah_pipeline_en_5.5.0_3.0_1726555832656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_qa_model_yadah_pipeline_en_5.5.0_3.0_1726555832656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_qa_model_yadah_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_qa_model_yadah_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_qa_model_yadah_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Yadah/my_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-chinese_roberta_wwm_ext_2_0_8_ddp_en.md b/docs/_posts/ahmedlone127/2024-09-17-chinese_roberta_wwm_ext_2_0_8_ddp_en.md new file mode 100644 index 00000000000000..02cc38c7614d9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-chinese_roberta_wwm_ext_2_0_8_ddp_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English chinese_roberta_wwm_ext_2_0_8_ddp BertForQuestionAnswering from DaydreamerF +author: John Snow Labs +name: chinese_roberta_wwm_ext_2_0_8_ddp +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_roberta_wwm_ext_2_0_8_ddp` is a English model originally trained by DaydreamerF. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_roberta_wwm_ext_2_0_8_ddp_en_5.5.0_3.0_1726544231697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_roberta_wwm_ext_2_0_8_ddp_en_5.5.0_3.0_1726544231697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("chinese_roberta_wwm_ext_2_0_8_ddp","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("chinese_roberta_wwm_ext_2_0_8_ddp", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_roberta_wwm_ext_2_0_8_ddp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/DaydreamerF/chinese-roberta-wwm-ext-2.0-8-ddp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-dataequity_opus_maltese_english_tagalog_en.md b/docs/_posts/ahmedlone127/2024-09-17-dataequity_opus_maltese_english_tagalog_en.md new file mode 100644 index 00000000000000..18297acde58a58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-dataequity_opus_maltese_english_tagalog_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dataequity_opus_maltese_english_tagalog MarianTransformer from dataequity +author: John Snow Labs +name: dataequity_opus_maltese_english_tagalog +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dataequity_opus_maltese_english_tagalog` is a English model originally trained by dataequity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dataequity_opus_maltese_english_tagalog_en_5.5.0_3.0_1726533249724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dataequity_opus_maltese_english_tagalog_en_5.5.0_3.0_1726533249724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("dataequity_opus_maltese_english_tagalog","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("dataequity_opus_maltese_english_tagalog","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dataequity_opus_maltese_english_tagalog| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|496.7 MB| + +## References + +https://huggingface.co/dataequity/dataequity-opus-mt-en-tl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-dipromats_subtask_1_base_train_en.md b/docs/_posts/ahmedlone127/2024-09-17-dipromats_subtask_1_base_train_en.md new file mode 100644 index 00000000000000..7532f882cc5a1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-dipromats_subtask_1_base_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dipromats_subtask_1_base_train XlmRoBertaForSequenceClassification from franfj +author: John Snow Labs +name: dipromats_subtask_1_base_train +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dipromats_subtask_1_base_train` is a English model originally trained by franfj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dipromats_subtask_1_base_train_en_5.5.0_3.0_1726536498631.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dipromats_subtask_1_base_train_en_5.5.0_3.0_1726536498631.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("dipromats_subtask_1_base_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("dipromats_subtask_1_base_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dipromats_subtask_1_base_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|806.5 MB| + +## References + +https://huggingface.co/franfj/DIPROMATS_subtask_1_base_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_cased_distilled_squad_full_lora_merged_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_cased_distilled_squad_full_lora_merged_en.md new file mode 100644 index 00000000000000..938a468b34e1e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_cased_distilled_squad_full_lora_merged_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_cased_distilled_squad_full_lora_merged DistilBertForQuestionAnswering from JeukHwang +author: John Snow Labs +name: distilbert_base_cased_distilled_squad_full_lora_merged +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_distilled_squad_full_lora_merged` is a English model originally trained by JeukHwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_full_lora_merged_en_5.5.0_3.0_1726555427765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_full_lora_merged_en_5.5.0_3.0_1726555427765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_cased_distilled_squad_full_lora_merged","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_cased_distilled_squad_full_lora_merged", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_distilled_squad_full_lora_merged| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/JeukHwang/distilbert-base-cased-distilled-squad-full-lora-merged \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_d5716d28_serhii_korobchenko_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_d5716d28_serhii_korobchenko_pipeline_en.md new file mode 100644 index 00000000000000..0255481cc810bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_d5716d28_serhii_korobchenko_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_serhii_korobchenko_pipeline pipeline DistilBertForQuestionAnswering from serhii-korobchenko +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_serhii_korobchenko_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_serhii_korobchenko_pipeline` is a English model originally trained by serhii-korobchenko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_serhii_korobchenko_pipeline_en_5.5.0_3.0_1726555589706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_serhii_korobchenko_pipeline_en_5.5.0_3.0_1726555589706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_serhii_korobchenko_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_serhii_korobchenko_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_serhii_korobchenko_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/serhii-korobchenko/distilbert-base-uncased-finetuned-squad-d5716d28 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_hashemghanem_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_hashemghanem_pipeline_en.md new file mode 100644 index 00000000000000..03578e01c89065 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_hashemghanem_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_hashemghanem_pipeline pipeline DistilBertForQuestionAnswering from Hashemghanem +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_hashemghanem_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_hashemghanem_pipeline` is a English model originally trained by Hashemghanem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hashemghanem_pipeline_en_5.5.0_3.0_1726555352165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hashemghanem_pipeline_en_5.5.0_3.0_1726555352165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_hashemghanem_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_hashemghanem_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_hashemghanem_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Hashemghanem/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p35_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p35_pipeline_en.md new file mode 100644 index 00000000000000..f1d7e14cc94ca6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p35_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p35_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p35_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p35_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p35_pipeline_en_5.5.0_3.0_1726555414504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p35_pipeline_en_5.5.0_3.0_1726555414504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_p35_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_p35_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p35_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|206.5 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p35 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-eng_med_en.md b/docs/_posts/ahmedlone127/2024-09-17-eng_med_en.md new file mode 100644 index 00000000000000..c9858bb1440d72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-eng_med_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English eng_med MarianTransformer from rickySaka +author: John Snow Labs +name: eng_med +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`eng_med` is a English model originally trained by rickySaka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/eng_med_en_5.5.0_3.0_1726533019659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/eng_med_en_5.5.0_3.0_1726533019659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("eng_med","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("eng_med","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|eng_med| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|530.3 MB| + +## References + +https://huggingface.co/rickySaka/eng-med \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-ep15_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-ep15_pipeline_en.md new file mode 100644 index 00000000000000..e6efedd3ad4dde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-ep15_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English ep15_pipeline pipeline WhisperForCTC from JoeTan +author: John Snow Labs +name: ep15_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ep15_pipeline` is a English model originally trained by JoeTan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ep15_pipeline_en_5.5.0_3.0_1726542171186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ep15_pipeline_en_5.5.0_3.0_1726542171186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ep15_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ep15_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ep15_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/JoeTan/Ep15 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-facebook_commet_classification_small_v2_en.md b/docs/_posts/ahmedlone127/2024-09-17-facebook_commet_classification_small_v2_en.md new file mode 100644 index 00000000000000..bec9eab1f91368 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-facebook_commet_classification_small_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English facebook_commet_classification_small_v2 XlmRoBertaForSequenceClassification from DuongTrongChi +author: John Snow Labs +name: facebook_commet_classification_small_v2 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`facebook_commet_classification_small_v2` is a English model originally trained by DuongTrongChi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/facebook_commet_classification_small_v2_en_5.5.0_3.0_1726535963039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/facebook_commet_classification_small_v2_en_5.5.0_3.0_1726535963039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("facebook_commet_classification_small_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("facebook_commet_classification_small_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|facebook_commet_classification_small_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|365.9 MB| + +## References + +https://huggingface.co/DuongTrongChi/facebook-commet-classification-small-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-facebook_commet_classification_small_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-facebook_commet_classification_small_v2_pipeline_en.md new file mode 100644 index 00000000000000..5dfbfbc2fc9e3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-facebook_commet_classification_small_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English facebook_commet_classification_small_v2_pipeline pipeline XlmRoBertaForSequenceClassification from DuongTrongChi +author: John Snow Labs +name: facebook_commet_classification_small_v2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`facebook_commet_classification_small_v2_pipeline` is a English model originally trained by DuongTrongChi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/facebook_commet_classification_small_v2_pipeline_en_5.5.0_3.0_1726535980337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/facebook_commet_classification_small_v2_pipeline_en_5.5.0_3.0_1726535980337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("facebook_commet_classification_small_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("facebook_commet_classification_small_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|facebook_commet_classification_small_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|365.9 MB| + +## References + +https://huggingface.co/DuongTrongChi/facebook-commet-classification-small-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-finetuned_bert_model_squad_datset_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-finetuned_bert_model_squad_datset_pipeline_en.md new file mode 100644 index 00000000000000..70489764c000f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-finetuned_bert_model_squad_datset_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English finetuned_bert_model_squad_datset_pipeline pipeline DistilBertForQuestionAnswering from AlyGreo +author: John Snow Labs +name: finetuned_bert_model_squad_datset_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bert_model_squad_datset_pipeline` is a English model originally trained by AlyGreo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bert_model_squad_datset_pipeline_en_5.5.0_3.0_1726555344077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bert_model_squad_datset_pipeline_en_5.5.0_3.0_1726555344077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_bert_model_squad_datset_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_bert_model_squad_datset_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bert_model_squad_datset_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/AlyGreo/finetuned-bert-model-squad-datset + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-17-finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_en.md new file mode 100644 index 00000000000000..fc81c5c821450e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base XlmRoBertaForSequenceClassification from aditeyabaral +author: John Snow Labs +name: finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base` is a English model originally trained by aditeyabaral. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_en_5.5.0_3.0_1726536087086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_en_5.5.0_3.0_1726536087086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|853.2 MB| + +## References + +https://huggingface.co/aditeyabaral/finetuned-iitp_pdt_review-additionalpretrained-xlm-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..c872383c42f8b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_pipeline pipeline XlmRoBertaForSequenceClassification from aditeyabaral +author: John Snow Labs +name: finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_pipeline` is a English model originally trained by aditeyabaral. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_pipeline_en_5.5.0_3.0_1726536147284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_pipeline_en_5.5.0_3.0_1726536147284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_iitp_pdt_review_additionalpretrained_xlm_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.2 MB| + +## References + +https://huggingface.co/aditeyabaral/finetuned-iitp_pdt_review-additionalpretrained-xlm-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-hate_hate_balance_random3_seed2_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-17-hate_hate_balance_random3_seed2_bernice_en.md new file mode 100644 index 00000000000000..822fd3ce618214 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-hate_hate_balance_random3_seed2_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random3_seed2_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random3_seed2_bernice +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random3_seed2_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed2_bernice_en_5.5.0_3.0_1726536372952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed2_bernice_en_5.5.0_3.0_1726536372952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed2_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed2_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random3_seed2_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|783.4 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random3_seed2-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-hate_hate_balance_random3_seed2_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-hate_hate_balance_random3_seed2_bernice_pipeline_en.md new file mode 100644 index 00000000000000..23c36bf309c482 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-hate_hate_balance_random3_seed2_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random3_seed2_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random3_seed2_bernice_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random3_seed2_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed2_bernice_pipeline_en_5.5.0_3.0_1726536522192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed2_bernice_pipeline_en_5.5.0_3.0_1726536522192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random3_seed2_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random3_seed2_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random3_seed2_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|783.4 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random3_seed2-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-hfa_poly_english_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-hfa_poly_english_small_pipeline_en.md new file mode 100644 index 00000000000000..9fa2f268f0dc81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-hfa_poly_english_small_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English hfa_poly_english_small_pipeline pipeline WhisperForCTC from kurianbenoy +author: John Snow Labs +name: hfa_poly_english_small_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hfa_poly_english_small_pipeline` is a English model originally trained by kurianbenoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hfa_poly_english_small_pipeline_en_5.5.0_3.0_1726541977940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hfa_poly_english_small_pipeline_en_5.5.0_3.0_1726541977940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hfa_poly_english_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hfa_poly_english_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hfa_poly_english_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/kurianbenoy/hfa-poly_english_small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-kkkh_whisper_small_distillation_att_loss_libri360_epochs_100_batch_4_concat_dataset_try2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-kkkh_whisper_small_distillation_att_loss_libri360_epochs_100_batch_4_concat_dataset_try2_pipeline_en.md new file mode 100644 index 00000000000000..755f2203789760 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-kkkh_whisper_small_distillation_att_loss_libri360_epochs_100_batch_4_concat_dataset_try2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English kkkh_whisper_small_distillation_att_loss_libri360_epochs_100_batch_4_concat_dataset_try2_pipeline pipeline WhisperForCTC from rohitp1 +author: John Snow Labs +name: kkkh_whisper_small_distillation_att_loss_libri360_epochs_100_batch_4_concat_dataset_try2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kkkh_whisper_small_distillation_att_loss_libri360_epochs_100_batch_4_concat_dataset_try2_pipeline` is a English model originally trained by rohitp1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kkkh_whisper_small_distillation_att_loss_libri360_epochs_100_batch_4_concat_dataset_try2_pipeline_en_5.5.0_3.0_1726539886078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kkkh_whisper_small_distillation_att_loss_libri360_epochs_100_batch_4_concat_dataset_try2_pipeline_en_5.5.0_3.0_1726539886078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kkkh_whisper_small_distillation_att_loss_libri360_epochs_100_batch_4_concat_dataset_try2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kkkh_whisper_small_distillation_att_loss_libri360_epochs_100_batch_4_concat_dataset_try2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kkkh_whisper_small_distillation_att_loss_libri360_epochs_100_batch_4_concat_dataset_try2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rohitp1/kkkh_whisper_small_distillation_att_loss_libri360_epochs_100_batch_4_concat_dataset_try2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-malasar_luke_dict_nan.md b/docs/_posts/ahmedlone127/2024-09-17-malasar_luke_dict_nan.md new file mode 100644 index 00000000000000..d91ef8ed1c36ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-malasar_luke_dict_nan.md @@ -0,0 +1,84 @@ +--- +layout: model +title: None malasar_luke_dict WhisperForCTC from leenag +author: John Snow Labs +name: malasar_luke_dict +date: 2024-09-17 +tags: [nan, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: nan +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malasar_luke_dict` is a None model originally trained by leenag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malasar_luke_dict_nan_5.5.0_3.0_1726550728182.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malasar_luke_dict_nan_5.5.0_3.0_1726550728182.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("malasar_luke_dict","nan") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("malasar_luke_dict", "nan") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malasar_luke_dict| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|nan| +|Size:|1.7 GB| + +## References + +https://huggingface.co/leenag/Malasar_Luke_Dict \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-maltese_english_vietnamese_hieundx_en.md b/docs/_posts/ahmedlone127/2024-09-17-maltese_english_vietnamese_hieundx_en.md new file mode 100644 index 00000000000000..daf9eb1daa6c4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-maltese_english_vietnamese_hieundx_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English maltese_english_vietnamese_hieundx MarianTransformer from hieundx +author: John Snow Labs +name: maltese_english_vietnamese_hieundx +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_english_vietnamese_hieundx` is a English model originally trained by hieundx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_english_vietnamese_hieundx_en_5.5.0_3.0_1726533302526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_english_vietnamese_hieundx_en_5.5.0_3.0_1726533302526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("maltese_english_vietnamese_hieundx","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("maltese_english_vietnamese_hieundx","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_english_vietnamese_hieundx| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|474.6 MB| + +## References + +https://huggingface.co/hieundx/mt-en-vi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-maltese_english_vietnamese_hieundx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-maltese_english_vietnamese_hieundx_pipeline_en.md new file mode 100644 index 00000000000000..e8f43160d8444b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-maltese_english_vietnamese_hieundx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English maltese_english_vietnamese_hieundx_pipeline pipeline MarianTransformer from hieundx +author: John Snow Labs +name: maltese_english_vietnamese_hieundx_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_english_vietnamese_hieundx_pipeline` is a English model originally trained by hieundx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_english_vietnamese_hieundx_pipeline_en_5.5.0_3.0_1726533325368.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_english_vietnamese_hieundx_pipeline_en_5.5.0_3.0_1726533325368.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("maltese_english_vietnamese_hieundx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("maltese_english_vietnamese_hieundx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_english_vietnamese_hieundx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|475.2 MB| + +## References + +https://huggingface.co/hieundx/mt-en-vi + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-malwhisper_v1_small_ml.md b/docs/_posts/ahmedlone127/2024-09-17-malwhisper_v1_small_ml.md new file mode 100644 index 00000000000000..d6d511c077bd2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-malwhisper_v1_small_ml.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Malayalam malwhisper_v1_small WhisperForCTC from smcproject +author: John Snow Labs +name: malwhisper_v1_small +date: 2024-09-17 +tags: [ml, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ml +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malwhisper_v1_small` is a Malayalam model originally trained by smcproject. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malwhisper_v1_small_ml_5.5.0_3.0_1726542348439.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malwhisper_v1_small_ml_5.5.0_3.0_1726542348439.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("malwhisper_v1_small","ml") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("malwhisper_v1_small", "ml") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malwhisper_v1_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ml| +|Size:|1.7 GB| + +## References + +https://huggingface.co/smcproject/Malwhisper-v1-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_en.md b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_en.md new file mode 100644 index 00000000000000..e9f41354402659 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis MarianTransformer from spasis +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis` is a English model originally trained by spasis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_en_5.5.0_3.0_1726533128081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_en_5.5.0_3.0_1726533128081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.3 MB| + +## References + +https://huggingface.co/spasis/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_pipeline_en.md new file mode 100644 index 00000000000000..42864822a84e0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_pipeline pipeline MarianTransformer from spasis +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_pipeline` is a English model originally trained by spasis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_pipeline_en_5.5.0_3.0_1726533153022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_pipeline_en_5.5.0_3.0_1726533153022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_spasis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.9 MB| + +## References + +https://huggingface.co/spasis/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-metaqa_en.md b/docs/_posts/ahmedlone127/2024-09-17-metaqa_en.md new file mode 100644 index 00000000000000..acf8f7bf6df4ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-metaqa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English metaqa BertForQuestionAnswering from haritzpuerto +author: John Snow Labs +name: metaqa +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`metaqa` is a English model originally trained by haritzpuerto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/metaqa_en_5.5.0_3.0_1726554157828.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/metaqa_en_5.5.0_3.0_1726554157828.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("metaqa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("metaqa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|metaqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/haritzpuerto/MetaQA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-metaqa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-metaqa_pipeline_en.md new file mode 100644 index 00000000000000..144a54416ab3ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-metaqa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English metaqa_pipeline pipeline BertForQuestionAnswering from haritzpuerto +author: John Snow Labs +name: metaqa_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`metaqa_pipeline` is a English model originally trained by haritzpuerto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/metaqa_pipeline_en_5.5.0_3.0_1726554177604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/metaqa_pipeline_en_5.5.0_3.0_1726554177604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("metaqa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("metaqa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|metaqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/haritzpuerto/MetaQA + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_french_finetuned_dyu_tonga_tonga_islands_french_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_french_finetuned_dyu_tonga_tonga_islands_french_en.md new file mode 100644 index 00000000000000..9b525d23fad4c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_french_finetuned_dyu_tonga_tonga_islands_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_french_finetuned_dyu_tonga_tonga_islands_french MarianTransformer from wuuthraad +author: John Snow Labs +name: opus_maltese_english_french_finetuned_dyu_tonga_tonga_islands_french +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_french_finetuned_dyu_tonga_tonga_islands_french` is a English model originally trained by wuuthraad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_french_finetuned_dyu_tonga_tonga_islands_french_en_5.5.0_3.0_1726532962280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_french_finetuned_dyu_tonga_tonga_islands_french_en_5.5.0_3.0_1726532962280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_french_finetuned_dyu_tonga_tonga_islands_french","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_french_finetuned_dyu_tonga_tonga_islands_french","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_french_finetuned_dyu_tonga_tonga_islands_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/wuuthraad/opus-mt-en-fr-finetuned-dyu-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ani_baghdasaryan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ani_baghdasaryan_pipeline_en.md new file mode 100644 index 00000000000000..533e67310cf33d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ani_baghdasaryan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ani_baghdasaryan_pipeline pipeline MarianTransformer from ani-baghdasaryan +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ani_baghdasaryan_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ani_baghdasaryan_pipeline` is a English model originally trained by ani-baghdasaryan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ani_baghdasaryan_pipeline_en_5.5.0_3.0_1726533199728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ani_baghdasaryan_pipeline_en_5.5.0_3.0_1726533199728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ani_baghdasaryan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ani_baghdasaryan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ani_baghdasaryan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.2 MB| + +## References + +https://huggingface.co/ani-baghdasaryan/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_vietnamese_finetuned_english_tonga_tonga_islands_vietnamese_thanhnguyenvn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_vietnamese_finetuned_english_tonga_tonga_islands_vietnamese_thanhnguyenvn_pipeline_en.md new file mode 100644 index 00000000000000..39eee997eb80f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_vietnamese_finetuned_english_tonga_tonga_islands_vietnamese_thanhnguyenvn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_vietnamese_finetuned_english_tonga_tonga_islands_vietnamese_thanhnguyenvn_pipeline pipeline MarianTransformer from thanhnguyenvn +author: John Snow Labs +name: opus_maltese_english_vietnamese_finetuned_english_tonga_tonga_islands_vietnamese_thanhnguyenvn_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_vietnamese_finetuned_english_tonga_tonga_islands_vietnamese_thanhnguyenvn_pipeline` is a English model originally trained by thanhnguyenvn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_vietnamese_finetuned_english_tonga_tonga_islands_vietnamese_thanhnguyenvn_pipeline_en_5.5.0_3.0_1726533073145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_vietnamese_finetuned_english_tonga_tonga_islands_vietnamese_thanhnguyenvn_pipeline_en_5.5.0_3.0_1726533073145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_vietnamese_finetuned_english_tonga_tonga_islands_vietnamese_thanhnguyenvn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_vietnamese_finetuned_english_tonga_tonga_islands_vietnamese_thanhnguyenvn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_vietnamese_finetuned_english_tonga_tonga_islands_vietnamese_thanhnguyenvn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|475.1 MB| + +## References + +https://huggingface.co/thanhnguyenvn/opus-mt-en-vi-finetuned-en-to-vi + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_ft_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_ft_2_pipeline_en.md new file mode 100644 index 00000000000000..da0e8c1d43b2f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_ft_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_ft_2_pipeline pipeline MarianTransformer from abdiharyadi +author: John Snow Labs +name: opus_maltese_ft_2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_ft_2_pipeline` is a English model originally trained by abdiharyadi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_ft_2_pipeline_en_5.5.0_3.0_1726533034917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_ft_2_pipeline_en_5.5.0_3.0_1726533034917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_ft_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_ft_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_ft_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|482.7 MB| + +## References + +https://huggingface.co/abdiharyadi/opus-mt-ft-2 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-predict_perception_xlmr_cause_object_en.md b/docs/_posts/ahmedlone127/2024-09-17-predict_perception_xlmr_cause_object_en.md new file mode 100644 index 00000000000000..313d6dae5ffbc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-predict_perception_xlmr_cause_object_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English predict_perception_xlmr_cause_object XlmRoBertaForSequenceClassification from responsibility-framing +author: John Snow Labs +name: predict_perception_xlmr_cause_object +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`predict_perception_xlmr_cause_object` is a English model originally trained by responsibility-framing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_object_en_5.5.0_3.0_1726535278465.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_object_en_5.5.0_3.0_1726535278465.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("predict_perception_xlmr_cause_object","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("predict_perception_xlmr_cause_object", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|predict_perception_xlmr_cause_object| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|837.6 MB| + +## References + +https://huggingface.co/responsibility-framing/predict-perception-xlmr-cause-object \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-predict_perception_xlmr_cause_object_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-predict_perception_xlmr_cause_object_pipeline_en.md new file mode 100644 index 00000000000000..c1167bf77102f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-predict_perception_xlmr_cause_object_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English predict_perception_xlmr_cause_object_pipeline pipeline XlmRoBertaForSequenceClassification from responsibility-framing +author: John Snow Labs +name: predict_perception_xlmr_cause_object_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`predict_perception_xlmr_cause_object_pipeline` is a English model originally trained by responsibility-framing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_object_pipeline_en_5.5.0_3.0_1726535348235.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_object_pipeline_en_5.5.0_3.0_1726535348235.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("predict_perception_xlmr_cause_object_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("predict_perception_xlmr_cause_object_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|predict_perception_xlmr_cause_object_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|837.6 MB| + +## References + +https://huggingface.co/responsibility-framing/predict-perception-xlmr-cause-object + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-protein_custom_model_veresnoemi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-protein_custom_model_veresnoemi_pipeline_en.md new file mode 100644 index 00000000000000..3588aa8b48ba03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-protein_custom_model_veresnoemi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English protein_custom_model_veresnoemi_pipeline pipeline XlmRoBertaForSequenceClassification from veresnoemi +author: John Snow Labs +name: protein_custom_model_veresnoemi_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`protein_custom_model_veresnoemi_pipeline` is a English model originally trained by veresnoemi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/protein_custom_model_veresnoemi_pipeline_en_5.5.0_3.0_1726536736840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/protein_custom_model_veresnoemi_pipeline_en_5.5.0_3.0_1726536736840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("protein_custom_model_veresnoemi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("protein_custom_model_veresnoemi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|protein_custom_model_veresnoemi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|818.6 MB| + +## References + +https://huggingface.co/veresnoemi/protein_custom_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_biomedical_clinical_spanish_finetuned_ner_craft_augmentedtransfer_spanish_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_biomedical_clinical_spanish_finetuned_ner_craft_augmentedtransfer_spanish_finetuned_ner_en.md new file mode 100644 index 00000000000000..f9e718db694a9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_biomedical_clinical_spanish_finetuned_ner_craft_augmentedtransfer_spanish_finetuned_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_biomedical_clinical_spanish_finetuned_ner_craft_augmentedtransfer_spanish_finetuned_ner RoBertaForTokenClassification from jaggernaut007 +author: John Snow Labs +name: roberta_base_biomedical_clinical_spanish_finetuned_ner_craft_augmentedtransfer_spanish_finetuned_ner +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_biomedical_clinical_spanish_finetuned_ner_craft_augmentedtransfer_spanish_finetuned_ner` is a English model originally trained by jaggernaut007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_clinical_spanish_finetuned_ner_craft_augmentedtransfer_spanish_finetuned_ner_en_5.5.0_3.0_1726537495239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_clinical_spanish_finetuned_ner_craft_augmentedtransfer_spanish_finetuned_ner_en_5.5.0_3.0_1726537495239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_biomedical_clinical_spanish_finetuned_ner_craft_augmentedtransfer_spanish_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_biomedical_clinical_spanish_finetuned_ner_craft_augmentedtransfer_spanish_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_biomedical_clinical_spanish_finetuned_ner_craft_augmentedtransfer_spanish_finetuned_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|450.1 MB| + +## References + +https://huggingface.co/jaggernaut007/roberta-base-biomedical-clinical-es-finetuned-ner-CRAFT_AugmentedTransfer_ES-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_ner_sathyajothi_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_ner_sathyajothi_en.md new file mode 100644 index 00000000000000..e2141ef1acfa52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_ner_sathyajothi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_ner_sathyajothi RoBertaForTokenClassification from sathyajothi +author: John Snow Labs +name: roberta_base_finetuned_ner_sathyajothi +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_ner_sathyajothi` is a English model originally trained by sathyajothi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_ner_sathyajothi_en_5.5.0_3.0_1726537828718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_ner_sathyajothi_en_5.5.0_3.0_1726537828718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_finetuned_ner_sathyajothi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_finetuned_ner_sathyajothi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_ner_sathyajothi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|425.8 MB| + +## References + +https://huggingface.co/sathyajothi/roberta-base-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_ner_demo_oyunbaatar_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_ner_demo_oyunbaatar_pipeline_mn.md new file mode 100644 index 00000000000000..932ac264558da6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_ner_demo_oyunbaatar_pipeline_mn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Mongolian roberta_base_ner_demo_oyunbaatar_pipeline pipeline RoBertaForTokenClassification from Oyunbaatar +author: John Snow Labs +name: roberta_base_ner_demo_oyunbaatar_pipeline +date: 2024-09-17 +tags: [mn, open_source, pipeline, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_demo_oyunbaatar_pipeline` is a Mongolian model originally trained by Oyunbaatar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_oyunbaatar_pipeline_mn_5.5.0_3.0_1726538000124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_oyunbaatar_pipeline_mn_5.5.0_3.0_1726538000124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ner_demo_oyunbaatar_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ner_demo_oyunbaatar_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_demo_oyunbaatar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|465.7 MB| + +## References + +https://huggingface.co/Oyunbaatar/roberta-base-ner-demo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_fine_tune_german_ner_de.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_fine_tune_german_ner_de.md new file mode 100644 index 00000000000000..ff0daabec65bfb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_fine_tune_german_ner_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German roberta_fine_tune_german_ner RoBertaForTokenClassification from MAbokahf +author: John Snow Labs +name: roberta_fine_tune_german_ner +date: 2024-09-17 +tags: [de, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_fine_tune_german_ner` is a German model originally trained by MAbokahf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_fine_tune_german_ner_de_5.5.0_3.0_1726537664970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_fine_tune_german_ner_de_5.5.0_3.0_1726537664970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_fine_tune_german_ner","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_fine_tune_german_ner", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_fine_tune_german_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|433.3 MB| + +## References + +https://huggingface.co/MAbokahf/roberta-fine-tune-de-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_fine_tune_german_ner_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_fine_tune_german_ner_pipeline_de.md new file mode 100644 index 00000000000000..adec55e2f05dea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_fine_tune_german_ner_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German roberta_fine_tune_german_ner_pipeline pipeline RoBertaForTokenClassification from MAbokahf +author: John Snow Labs +name: roberta_fine_tune_german_ner_pipeline +date: 2024-09-17 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_fine_tune_german_ner_pipeline` is a German model originally trained by MAbokahf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_fine_tune_german_ner_pipeline_de_5.5.0_3.0_1726537699453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_fine_tune_german_ner_pipeline_de_5.5.0_3.0_1726537699453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_fine_tune_german_ner_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_fine_tune_german_ner_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_fine_tune_german_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|433.3 MB| + +## References + +https://huggingface.co/MAbokahf/roberta-fine-tune-de-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_large_metaie_super_academia_gpt4o_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_large_metaie_super_academia_gpt4o_en.md new file mode 100644 index 00000000000000..4be04280a57749 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_large_metaie_super_academia_gpt4o_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_metaie_super_academia_gpt4o RoBertaForTokenClassification from KomeijiForce +author: John Snow Labs +name: roberta_large_metaie_super_academia_gpt4o +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_metaie_super_academia_gpt4o` is a English model originally trained by KomeijiForce. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_metaie_super_academia_gpt4o_en_5.5.0_3.0_1726537953075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_metaie_super_academia_gpt4o_en_5.5.0_3.0_1726537953075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_metaie_super_academia_gpt4o","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_metaie_super_academia_gpt4o", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_metaie_super_academia_gpt4o| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/KomeijiForce/roberta-large-metaie-super-academia-gpt4o \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_large_metaie_super_academia_gpt4o_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_large_metaie_super_academia_gpt4o_pipeline_en.md new file mode 100644 index 00000000000000..d47ea0bcc7e567 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_large_metaie_super_academia_gpt4o_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_metaie_super_academia_gpt4o_pipeline pipeline RoBertaForTokenClassification from KomeijiForce +author: John Snow Labs +name: roberta_large_metaie_super_academia_gpt4o_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_metaie_super_academia_gpt4o_pipeline` is a English model originally trained by KomeijiForce. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_metaie_super_academia_gpt4o_pipeline_en_5.5.0_3.0_1726538025011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_metaie_super_academia_gpt4o_pipeline_en_5.5.0_3.0_1726538025011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_metaie_super_academia_gpt4o_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_metaie_super_academia_gpt4o_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_metaie_super_academia_gpt4o_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/KomeijiForce/roberta-large-metaie-super-academia-gpt4o + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_tagalog_base_ft_udpos213_serbian_pipeline_tl.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_tagalog_base_ft_udpos213_serbian_pipeline_tl.md new file mode 100644 index 00000000000000..7b54250fd1a165 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_tagalog_base_ft_udpos213_serbian_pipeline_tl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Tagalog roberta_tagalog_base_ft_udpos213_serbian_pipeline pipeline RoBertaForTokenClassification from iceman2434 +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_serbian_pipeline +date: 2024-09-17 +tags: [tl, open_source, pipeline, onnx] +task: Named Entity Recognition +language: tl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_serbian_pipeline` is a Tagalog model originally trained by iceman2434. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_serbian_pipeline_tl_5.5.0_3.0_1726537755379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_serbian_pipeline_tl_5.5.0_3.0_1726537755379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tagalog_base_ft_udpos213_serbian_pipeline", lang = "tl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tagalog_base_ft_udpos213_serbian_pipeline", lang = "tl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_serbian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tl| +|Size:|407.2 MB| + +## References + +https://huggingface.co/iceman2434/roberta-tagalog-base-ft-udpos213-sr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_pipeline_en.md new file mode 100644 index 00000000000000..38b0b15c96946c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_pipeline` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_pipeline_en_5.5.0_3.0_1726536284906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_pipeline_en_5.5.0_3.0_1726536284906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|884.2 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-SCR-D2_data-AmazonScience_massive_all_1_1_betta-jason + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-tiny_english_uva_chunked_with_synthetic_v2_4_1e_05_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-tiny_english_uva_chunked_with_synthetic_v2_4_1e_05_pipeline_en.md new file mode 100644 index 00000000000000..3f8a423ff66715 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-tiny_english_uva_chunked_with_synthetic_v2_4_1e_05_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tiny_english_uva_chunked_with_synthetic_v2_4_1e_05_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_uva_chunked_with_synthetic_v2_4_1e_05_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_uva_chunked_with_synthetic_v2_4_1e_05_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_uva_chunked_with_synthetic_v2_4_1e_05_pipeline_en_5.5.0_3.0_1726550825540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_uva_chunked_with_synthetic_v2_4_1e_05_pipeline_en_5.5.0_3.0_1726550825540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_english_uva_chunked_with_synthetic_v2_4_1e_05_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_english_uva_chunked_with_synthetic_v2_4_1e_05_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_uva_chunked_with_synthetic_v2_4_1e_05_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.9 MB| + +## References + +https://huggingface.co/saahith/tiny.en-uva_chunked_with_synthetic_v2-4-1e-05 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-tinymax_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-tinymax_pipeline_en.md new file mode 100644 index 00000000000000..ec335e91631ab8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-tinymax_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tinymax_pipeline pipeline WhisperForCTC from tabsadem +author: John Snow Labs +name: tinymax_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinymax_pipeline` is a English model originally trained by tabsadem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinymax_pipeline_en_5.5.0_3.0_1726549990687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinymax_pipeline_en_5.5.0_3.0_1726549990687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tinymax_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tinymax_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinymax_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|392.1 MB| + +## References + +https://huggingface.co/tabsadem/tinyMAX + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper4_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper4_en.md new file mode 100644 index 00000000000000..a1b52d6601f8c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper4_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper4 WhisperForCTC from khaingsmon +author: John Snow Labs +name: whisper4 +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper4` is a English model originally trained by khaingsmon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper4_en_5.5.0_3.0_1726546092064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper4_en_5.5.0_3.0_1726546092064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper4","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper4", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|393.9 MB| + +## References + +https://huggingface.co/khaingsmon/whisper4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_catalan_pipeline_ca.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_catalan_pipeline_ca.md new file mode 100644 index 00000000000000..7e0a74d9f296c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_catalan_pipeline_ca.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Catalan, Valencian whisper_base_catalan_pipeline pipeline WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_base_catalan_pipeline +date: 2024-09-17 +tags: [ca, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ca +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_catalan_pipeline` is a Catalan, Valencian model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_catalan_pipeline_ca_5.5.0_3.0_1726542784525.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_catalan_pipeline_ca_5.5.0_3.0_1726542784525.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_catalan_pipeline", lang = "ca") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_catalan_pipeline", lang = "ca") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_catalan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ca| +|Size:|642.8 MB| + +## References + +https://huggingface.co/zuazo/whisper-base-ca + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_chinese_cer_zh.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_chinese_cer_zh.md new file mode 100644 index 00000000000000..3cec64e271238d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_chinese_cer_zh.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Chinese whisper_base_chinese_cer WhisperForCTC from HuangJordan +author: John Snow Labs +name: whisper_base_chinese_cer +date: 2024-09-17 +tags: [zh, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_chinese_cer` is a Chinese model originally trained by HuangJordan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_chinese_cer_zh_5.5.0_3.0_1726551274843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_chinese_cer_zh_5.5.0_3.0_1726551274843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_chinese_cer","zh") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_chinese_cer", "zh") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_chinese_cer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|zh| +|Size:|642.1 MB| + +## References + +https://huggingface.co/HuangJordan/whisper-base-chinese-cer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_chuvash_highlr_czech_cs.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_chuvash_highlr_czech_cs.md new file mode 100644 index 00000000000000..52d987c3c7d2f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_chuvash_highlr_czech_cs.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Czech whisper_base_chuvash_highlr_czech WhisperForCTC from sgangireddy +author: John Snow Labs +name: whisper_base_chuvash_highlr_czech +date: 2024-09-17 +tags: [cs, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: cs +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_chuvash_highlr_czech` is a Czech model originally trained by sgangireddy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_chuvash_highlr_czech_cs_5.5.0_3.0_1726549255969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_chuvash_highlr_czech_cs_5.5.0_3.0_1726549255969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_chuvash_highlr_czech","cs") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_chuvash_highlr_czech", "cs") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_chuvash_highlr_czech| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|cs| +|Size:|642.5 MB| + +## References + +https://huggingface.co/sgangireddy/whisper-base-cv-highLR-cs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_full_data_aug_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_full_data_aug_v1_pipeline_en.md new file mode 100644 index 00000000000000..854b994090ce84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_full_data_aug_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_full_data_aug_v1_pipeline pipeline WhisperForCTC from thanhduycao +author: John Snow Labs +name: whisper_base_full_data_aug_v1_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_full_data_aug_v1_pipeline` is a English model originally trained by thanhduycao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_full_data_aug_v1_pipeline_en_5.5.0_3.0_1726538981316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_full_data_aug_v1_pipeline_en_5.5.0_3.0_1726538981316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_full_data_aug_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_full_data_aug_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_full_data_aug_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.6 MB| + +## References + +https://huggingface.co/thanhduycao/whisper-base-full-data-aug-v1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_portuguese_zuazo_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_portuguese_zuazo_pipeline_pt.md new file mode 100644 index 00000000000000..4a9eea9113b0cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_portuguese_zuazo_pipeline_pt.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Portuguese whisper_base_portuguese_zuazo_pipeline pipeline WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_base_portuguese_zuazo_pipeline +date: 2024-09-17 +tags: [pt, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_portuguese_zuazo_pipeline` is a Portuguese model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_portuguese_zuazo_pipeline_pt_5.5.0_3.0_1726541285731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_portuguese_zuazo_pipeline_pt_5.5.0_3.0_1726541285731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_portuguese_zuazo_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_portuguese_zuazo_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_portuguese_zuazo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|642.3 MB| + +## References + +https://huggingface.co/zuazo/whisper-base-pt + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_portuguese_zuazo_pt.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_portuguese_zuazo_pt.md new file mode 100644 index 00000000000000..12253309630283 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_portuguese_zuazo_pt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Portuguese whisper_base_portuguese_zuazo WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_base_portuguese_zuazo +date: 2024-09-17 +tags: [pt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_portuguese_zuazo` is a Portuguese model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_portuguese_zuazo_pt_5.5.0_3.0_1726541244845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_portuguese_zuazo_pt_5.5.0_3.0_1726541244845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_portuguese_zuazo","pt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_portuguese_zuazo", "pt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_portuguese_zuazo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pt| +|Size:|642.3 MB| + +## References + +https://huggingface.co/zuazo/whisper-base-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_cli_dropout_small_oriya_pipeline_or.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_cli_dropout_small_oriya_pipeline_or.md new file mode 100644 index 00000000000000..eb91c467d00d15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_cli_dropout_small_oriya_pipeline_or.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Oriya (macrolanguage) whisper_cli_dropout_small_oriya_pipeline pipeline WhisperForCTC from auro +author: John Snow Labs +name: whisper_cli_dropout_small_oriya_pipeline +date: 2024-09-17 +tags: [or, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: or +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_cli_dropout_small_oriya_pipeline` is a Oriya (macrolanguage) model originally trained by auro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_cli_dropout_small_oriya_pipeline_or_5.5.0_3.0_1726546430139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_cli_dropout_small_oriya_pipeline_or_5.5.0_3.0_1726546430139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_cli_dropout_small_oriya_pipeline", lang = "or") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_cli_dropout_small_oriya_pipeline", lang = "or") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_cli_dropout_small_oriya_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|or| +|Size:|1.7 GB| + +## References + +https://huggingface.co/auro/whisper-cli-dropout-small-or + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_arnw_ar.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_arnw_ar.md new file mode 100644 index 00000000000000..e70cb8a7c9a6bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_arnw_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic whisper_small_arnw WhisperForCTC from neethuvm +author: John Snow Labs +name: whisper_small_arnw +date: 2024-09-17 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arnw` is a Arabic model originally trained by neethuvm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arnw_ar_5.5.0_3.0_1726547453149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arnw_ar_5.5.0_3.0_1726547453149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_arnw","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_arnw", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arnw| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/neethuvm/whisper-small-arnw \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_arnw_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_arnw_pipeline_ar.md new file mode 100644 index 00000000000000..aa338eac314ea7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_arnw_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_arnw_pipeline pipeline WhisperForCTC from neethuvm +author: John Snow Labs +name: whisper_small_arnw_pipeline +date: 2024-09-17 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arnw_pipeline` is a Arabic model originally trained by neethuvm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arnw_pipeline_ar_5.5.0_3.0_1726547535204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arnw_pipeline_ar_5.5.0_3.0_1726547535204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arnw_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arnw_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arnw_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/neethuvm/whisper-small-arnw + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_basque_cv16_1_eu.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_basque_cv16_1_eu.md new file mode 100644 index 00000000000000..533ffa19a9a24c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_basque_cv16_1_eu.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Basque whisper_small_basque_cv16_1 WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_small_basque_cv16_1 +date: 2024-09-17 +tags: [eu, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: eu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_basque_cv16_1` is a Basque model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_basque_cv16_1_eu_5.5.0_3.0_1726543110949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_basque_cv16_1_eu_5.5.0_3.0_1726543110949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_basque_cv16_1","eu") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_basque_cv16_1", "eu") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_basque_cv16_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|eu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/zuazo/whisper-small-eu-cv16_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_basque_cv16_1_pipeline_eu.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_basque_cv16_1_pipeline_eu.md new file mode 100644 index 00000000000000..cd0fc973bc879c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_basque_cv16_1_pipeline_eu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Basque whisper_small_basque_cv16_1_pipeline pipeline WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_small_basque_cv16_1_pipeline +date: 2024-09-17 +tags: [eu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: eu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_basque_cv16_1_pipeline` is a Basque model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_basque_cv16_1_pipeline_eu_5.5.0_3.0_1726543193190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_basque_cv16_1_pipeline_eu_5.5.0_3.0_1726543193190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_basque_cv16_1_pipeline", lang = "eu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_basque_cv16_1_pipeline", lang = "eu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_basque_cv16_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|eu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/zuazo/whisper-small-eu-cv16_1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_bengali_crblp_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_bengali_crblp_pipeline_bn.md new file mode 100644 index 00000000000000..fe6281e6715367 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_bengali_crblp_pipeline_bn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bengali whisper_small_bengali_crblp_pipeline pipeline WhisperForCTC from Rakib +author: John Snow Labs +name: whisper_small_bengali_crblp_pipeline +date: 2024-09-17 +tags: [bn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bengali_crblp_pipeline` is a Bengali model originally trained by Rakib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_crblp_pipeline_bn_5.5.0_3.0_1726540897228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_crblp_pipeline_bn_5.5.0_3.0_1726540897228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_bengali_crblp_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_bengali_crblp_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bengali_crblp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Rakib/whisper-small-bn-crblp + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_custom300_1e_5_va2000_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_custom300_1e_5_va2000_en.md new file mode 100644 index 00000000000000..c064e46c9e1076 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_custom300_1e_5_va2000_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_custom300_1e_5_va2000 WhisperForCTC from racheltong +author: John Snow Labs +name: whisper_small_custom300_1e_5_va2000 +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_custom300_1e_5_va2000` is a English model originally trained by racheltong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_custom300_1e_5_va2000_en_5.5.0_3.0_1726540894962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_custom300_1e_5_va2000_en_5.5.0_3.0_1726540894962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_custom300_1e_5_va2000","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_custom300_1e_5_va2000", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_custom300_1e_5_va2000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/racheltong/whisper-small-custom300-1e-5-va2000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_custom300_1e_5_va2000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_custom300_1e_5_va2000_pipeline_en.md new file mode 100644 index 00000000000000..0cfdaf6f61fec5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_custom300_1e_5_va2000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_custom300_1e_5_va2000_pipeline pipeline WhisperForCTC from racheltong +author: John Snow Labs +name: whisper_small_custom300_1e_5_va2000_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_custom300_1e_5_va2000_pipeline` is a English model originally trained by racheltong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_custom300_1e_5_va2000_pipeline_en_5.5.0_3.0_1726540980409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_custom300_1e_5_va2000_pipeline_en_5.5.0_3.0_1726540980409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_custom300_1e_5_va2000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_custom300_1e_5_va2000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_custom300_1e_5_va2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/racheltong/whisper-small-custom300-1e-5-va2000 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_divehi_agercas_dv.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_divehi_agercas_dv.md new file mode 100644 index 00000000000000..3641763af1324b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_divehi_agercas_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_agercas WhisperForCTC from agercas +author: John Snow Labs +name: whisper_small_divehi_agercas +date: 2024-09-17 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_agercas` is a Dhivehi, Divehi, Maldivian model originally trained by agercas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_agercas_dv_5.5.0_3.0_1726552940128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_agercas_dv_5.5.0_3.0_1726552940128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_agercas","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_agercas", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_agercas| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/agercas/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_galician_zuazo_gl.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_galician_zuazo_gl.md new file mode 100644 index 00000000000000..a9a857d48b2085 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_galician_zuazo_gl.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Galician whisper_small_galician_zuazo WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_small_galician_zuazo +date: 2024-09-17 +tags: [gl, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_galician_zuazo` is a Galician model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_galician_zuazo_gl_5.5.0_3.0_1726547819574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_galician_zuazo_gl_5.5.0_3.0_1726547819574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_galician_zuazo","gl") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_galician_zuazo", "gl") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_galician_zuazo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|gl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/zuazo/whisper-small-gl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_howlbz_zh.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_howlbz_zh.md new file mode 100644 index 00000000000000..dacb4e7630569b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_howlbz_zh.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Chinese whisper_small_hindi_howlbz WhisperForCTC from howlbz +author: John Snow Labs +name: whisper_small_hindi_howlbz +date: 2024-09-17 +tags: [zh, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_howlbz` is a Chinese model originally trained by howlbz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_howlbz_zh_5.5.0_3.0_1726538869002.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_howlbz_zh_5.5.0_3.0_1726538869002.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_howlbz","zh") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_howlbz", "zh") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_howlbz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|zh| +|Size:|1.7 GB| + +## References + +https://huggingface.co/howlbz/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_mukund017_hi.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_mukund017_hi.md new file mode 100644 index 00000000000000..120b303eb136f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_mukund017_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_hindi_mukund017 WhisperForCTC from Mukund017 +author: John Snow Labs +name: whisper_small_hindi_mukund017 +date: 2024-09-17 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_mukund017` is a Hindi model originally trained by Mukund017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_mukund017_hi_5.5.0_3.0_1726547625199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_mukund017_hi_5.5.0_3.0_1726547625199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_mukund017","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_mukund017", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_mukund017| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Mukund017/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_mukund017_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_mukund017_pipeline_hi.md new file mode 100644 index 00000000000000..30a66f20e8cdba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_mukund017_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_hindi_mukund017_pipeline pipeline WhisperForCTC from Mukund017 +author: John Snow Labs +name: whisper_small_hindi_mukund017_pipeline +date: 2024-09-17 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_mukund017_pipeline` is a Hindi model originally trained by Mukund017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_mukund017_pipeline_hi_5.5.0_3.0_1726547704957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_mukund017_pipeline_hi_5.5.0_3.0_1726547704957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_mukund017_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_mukund017_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_mukund017_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Mukund017/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hungarian_gyikesz_pipeline_hu.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hungarian_gyikesz_pipeline_hu.md new file mode 100644 index 00000000000000..aa6f2edbd151cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hungarian_gyikesz_pipeline_hu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hungarian whisper_small_hungarian_gyikesz_pipeline pipeline WhisperForCTC from gyikesz +author: John Snow Labs +name: whisper_small_hungarian_gyikesz_pipeline +date: 2024-09-17 +tags: [hu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hungarian_gyikesz_pipeline` is a Hungarian model originally trained by gyikesz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hungarian_gyikesz_pipeline_hu_5.5.0_3.0_1726551434884.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hungarian_gyikesz_pipeline_hu_5.5.0_3.0_1726551434884.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hungarian_gyikesz_pipeline", lang = "hu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hungarian_gyikesz_pipeline", lang = "hu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hungarian_gyikesz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/gyikesz/whisper-small-hu + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_korean_yspeed_hi.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_korean_yspeed_hi.md new file mode 100644 index 00000000000000..0d86fddbe252a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_korean_yspeed_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_korean_yspeed WhisperForCTC from Gummybear05 +author: John Snow Labs +name: whisper_small_korean_yspeed +date: 2024-09-17 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_korean_yspeed` is a Hindi model originally trained by Gummybear05. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_korean_yspeed_hi_5.5.0_3.0_1726546997751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_korean_yspeed_hi_5.5.0_3.0_1726546997751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_korean_yspeed","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_korean_yspeed", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_korean_yspeed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Gummybear05/whisper-small-ko-Yspeed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_marathi_steja_mr.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_marathi_steja_mr.md new file mode 100644 index 00000000000000..4cc174acf833ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_marathi_steja_mr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Marathi whisper_small_marathi_steja WhisperForCTC from steja +author: John Snow Labs +name: whisper_small_marathi_steja +date: 2024-09-17 +tags: [mr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_marathi_steja` is a Marathi model originally trained by steja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_marathi_steja_mr_5.5.0_3.0_1726542662539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_marathi_steja_mr_5.5.0_3.0_1726542662539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_marathi_steja","mr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_marathi_steja", "mr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_marathi_steja| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/steja/whisper-small-marathi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_ndonga_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_ndonga_pipeline_en.md new file mode 100644 index 00000000000000..26e4644426380c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_ndonga_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_ndonga_pipeline pipeline WhisperForCTC from deedax +author: John Snow Labs +name: whisper_small_ndonga_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ndonga_pipeline` is a English model originally trained by deedax. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ndonga_pipeline_en_5.5.0_3.0_1726542272156.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ndonga_pipeline_en_5.5.0_3.0_1726542272156.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_ndonga_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_ndonga_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ndonga_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/deedax/whisper-small-ng + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_ne2_1_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_ne2_1_en.md new file mode 100644 index 00000000000000..1da5d6029015ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_ne2_1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_ne2_1 WhisperForCTC from SonishMaharjan +author: John Snow Labs +name: whisper_small_ne2_1 +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ne2_1` is a English model originally trained by SonishMaharjan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ne2_1_en_5.5.0_3.0_1726549632086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ne2_1_en_5.5.0_3.0_1726549632086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_ne2_1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_ne2_1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ne2_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/SonishMaharjan/whisper-small-ne2.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_singlish_augmented_again_1200steps_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_singlish_augmented_again_1200steps_en.md new file mode 100644 index 00000000000000..d1cd2a5154838d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_singlish_augmented_again_1200steps_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_singlish_augmented_again_1200steps WhisperForCTC from yy1412 +author: John Snow Labs +name: whisper_small_singlish_augmented_again_1200steps +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_singlish_augmented_again_1200steps` is a English model originally trained by yy1412. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_singlish_augmented_again_1200steps_en_5.5.0_3.0_1726542839734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_singlish_augmented_again_1200steps_en_5.5.0_3.0_1726542839734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_singlish_augmented_again_1200steps","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_singlish_augmented_again_1200steps", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_singlish_augmented_again_1200steps| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/yy1412/whisper-small-singlish_augmented_again_1200steps \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_swedish_v4_sv.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_swedish_v4_sv.md new file mode 100644 index 00000000000000..500c2813f34ec5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_swedish_v4_sv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Swedish whisper_small_swedish_v4 WhisperForCTC from AdrianHR +author: John Snow Labs +name: whisper_small_swedish_v4 +date: 2024-09-17 +tags: [sv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swedish_v4` is a Swedish model originally trained by AdrianHR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_v4_sv_5.5.0_3.0_1726547639629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_v4_sv_5.5.0_3.0_1726547639629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_swedish_v4","sv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_swedish_v4", "sv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swedish_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/AdrianHR/whisper-small-sv-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_train_v2_1_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_train_v2_1_en.md new file mode 100644 index 00000000000000..bd1ed5abe5c11a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_train_v2_1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_train_v2_1 WhisperForCTC from xbilek25 +author: John Snow Labs +name: whisper_small_train_v2_1 +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_train_v2_1` is a English model originally trained by xbilek25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_train_v2_1_en_5.5.0_3.0_1726552328360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_train_v2_1_en_5.5.0_3.0_1726552328360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_train_v2_1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_train_v2_1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_train_v2_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/xbilek25/whisper-small-train-v2.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_train_v2_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_train_v2_1_pipeline_en.md new file mode 100644 index 00000000000000..fe3c29f589c7d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_train_v2_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_train_v2_1_pipeline pipeline WhisperForCTC from xbilek25 +author: John Snow Labs +name: whisper_small_train_v2_1_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_train_v2_1_pipeline` is a English model originally trained by xbilek25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_train_v2_1_pipeline_en_5.5.0_3.0_1726552415889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_train_v2_1_pipeline_en_5.5.0_3.0_1726552415889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_train_v2_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_train_v2_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_train_v2_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/xbilek25/whisper-small-train-v2.1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_yoruba_kaggle_train_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_yoruba_kaggle_train_en.md new file mode 100644 index 00000000000000..568105afab6975 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_yoruba_kaggle_train_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_yoruba_kaggle_train WhisperForCTC from Esammy +author: John Snow Labs +name: whisper_small_yoruba_kaggle_train +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_yoruba_kaggle_train` is a English model originally trained by Esammy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_kaggle_train_en_5.5.0_3.0_1726546579531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_kaggle_train_en_5.5.0_3.0_1726546579531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_yoruba_kaggle_train","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_yoruba_kaggle_train", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_yoruba_kaggle_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Esammy/whisper-small-yoruba_kaggle_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_yoruba_kaggle_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_yoruba_kaggle_train_pipeline_en.md new file mode 100644 index 00000000000000..acfe983f0b9c71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_yoruba_kaggle_train_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_yoruba_kaggle_train_pipeline pipeline WhisperForCTC from Esammy +author: John Snow Labs +name: whisper_small_yoruba_kaggle_train_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_yoruba_kaggle_train_pipeline` is a English model originally trained by Esammy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_kaggle_train_pipeline_en_5.5.0_3.0_1726546657462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_kaggle_train_pipeline_en_5.5.0_3.0_1726546657462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_yoruba_kaggle_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_yoruba_kaggle_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_yoruba_kaggle_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Esammy/whisper-small-yoruba_kaggle_train + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_chinese_zhihcheng_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_chinese_zhihcheng_pipeline_zh.md new file mode 100644 index 00000000000000..30005a6f5345dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_chinese_zhihcheng_pipeline_zh.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Chinese whisper_tiny_chinese_zhihcheng_pipeline pipeline WhisperForCTC from ZhihCheng +author: John Snow Labs +name: whisper_tiny_chinese_zhihcheng_pipeline +date: 2024-09-17 +tags: [zh, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_chinese_zhihcheng_pipeline` is a Chinese model originally trained by ZhihCheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_zhihcheng_pipeline_zh_5.5.0_3.0_1726551468653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_zhihcheng_pipeline_zh_5.5.0_3.0_1726551468653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_chinese_zhihcheng_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_chinese_zhihcheng_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_chinese_zhihcheng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|389.6 MB| + +## References + +https://huggingface.co/ZhihCheng/whisper-tiny-zh + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_engmed_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_engmed_v2_pipeline_en.md new file mode 100644 index 00000000000000..cad5e8e7fdfefc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_engmed_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_engmed_v2_pipeline pipeline WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_tiny_engmed_v2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_engmed_v2_pipeline` is a English model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_engmed_v2_pipeline_en_5.5.0_3.0_1726547766801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_engmed_v2_pipeline_en_5.5.0_3.0_1726547766801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_engmed_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_engmed_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_engmed_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|378.0 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-tiny-engmed-v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_ga2en_v1_4_ga.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_ga2en_v1_4_ga.md new file mode 100644 index 00000000000000..c8d52dcc431722 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_ga2en_v1_4_ga.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Irish whisper_tiny_ga2en_v1_4 WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_tiny_ga2en_v1_4 +date: 2024-09-17 +tags: [ga, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ga +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_ga2en_v1_4` is a Irish model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_ga2en_v1_4_ga_5.5.0_3.0_1726552656173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_ga2en_v1_4_ga_5.5.0_3.0_1726552656173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_ga2en_v1_4","ga") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_ga2en_v1_4", "ga") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_ga2en_v1_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ga| +|Size:|389.4 MB| + +## References + +https://huggingface.co/ymoslem/whisper-tiny-ga2en-v1.4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_indonesian_evanarlian_id.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_indonesian_evanarlian_id.md new file mode 100644 index 00000000000000..71c1405cf04604 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_indonesian_evanarlian_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian whisper_tiny_indonesian_evanarlian WhisperForCTC from evanarlian +author: John Snow Labs +name: whisper_tiny_indonesian_evanarlian +date: 2024-09-17 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_indonesian_evanarlian` is a Indonesian model originally trained by evanarlian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_indonesian_evanarlian_id_5.5.0_3.0_1726542350786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_indonesian_evanarlian_id_5.5.0_3.0_1726542350786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_indonesian_evanarlian","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_indonesian_evanarlian", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_indonesian_evanarlian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|390.7 MB| + +## References + +https://huggingface.co/evanarlian/whisper-tiny-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_italian_6_it.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_italian_6_it.md new file mode 100644 index 00000000000000..ad6be9cd057e60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_italian_6_it.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Italian whisper_tiny_italian_6 WhisperForCTC from GIanlucaRub +author: John Snow Labs +name: whisper_tiny_italian_6 +date: 2024-09-17 +tags: [it, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_italian_6` is a Italian model originally trained by GIanlucaRub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_6_it_5.5.0_3.0_1726548029492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_6_it_5.5.0_3.0_1726548029492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_italian_6","it") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_italian_6", "it") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_italian_6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|it| +|Size:|390.5 MB| + +## References + +https://huggingface.co/GIanlucaRub/whisper-tiny-it-6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_julienchoukroun_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_julienchoukroun_en.md new file mode 100644 index 00000000000000..f7533be73acd05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_julienchoukroun_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_julienchoukroun WhisperForCTC from JulienChoukroun +author: John Snow Labs +name: whisper_tiny_julienchoukroun +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_julienchoukroun` is a English model originally trained by JulienChoukroun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_julienchoukroun_en_5.5.0_3.0_1726551726997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_julienchoukroun_en_5.5.0_3.0_1726551726997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_julienchoukroun","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_julienchoukroun", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_julienchoukroun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/JulienChoukroun/whisper-tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds14_english_bayerasif_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds14_english_bayerasif_en.md new file mode 100644 index 00000000000000..595d4e4e1dedff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds14_english_bayerasif_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_bayerasif WhisperForCTC from bayerasif +author: John Snow Labs +name: whisper_tiny_minds14_english_bayerasif +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_bayerasif` is a English model originally trained by bayerasif. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_bayerasif_en_5.5.0_3.0_1726538881252.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_bayerasif_en_5.5.0_3.0_1726538881252.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_english_bayerasif","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_english_bayerasif", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_bayerasif| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/bayerasif/whisper-tiny-minds14-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds14_english_bayerasif_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds14_english_bayerasif_pipeline_en.md new file mode 100644 index 00000000000000..64357472f99c52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds14_english_bayerasif_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_bayerasif_pipeline pipeline WhisperForCTC from bayerasif +author: John Snow Labs +name: whisper_tiny_minds14_english_bayerasif_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_bayerasif_pipeline` is a English model originally trained by bayerasif. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_bayerasif_pipeline_en_5.5.0_3.0_1726538906336.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_bayerasif_pipeline_en_5.5.0_3.0_1726538906336.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_english_bayerasif_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_english_bayerasif_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_bayerasif_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/bayerasif/whisper-tiny-minds14-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds_malikibrar_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds_malikibrar_en.md new file mode 100644 index 00000000000000..374c575711ddc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds_malikibrar_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds_malikibrar WhisperForCTC from MalikIbrar +author: John Snow Labs +name: whisper_tiny_minds_malikibrar +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds_malikibrar` is a English model originally trained by MalikIbrar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds_malikibrar_en_5.5.0_3.0_1726538680302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds_malikibrar_en_5.5.0_3.0_1726538680302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds_malikibrar","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds_malikibrar", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds_malikibrar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/MalikIbrar/whisper-tiny-minds \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds_malikibrar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds_malikibrar_pipeline_en.md new file mode 100644 index 00000000000000..de4c3969d963b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds_malikibrar_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds_malikibrar_pipeline pipeline WhisperForCTC from MalikIbrar +author: John Snow Labs +name: whisper_tiny_minds_malikibrar_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds_malikibrar_pipeline` is a English model originally trained by MalikIbrar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds_malikibrar_pipeline_en_5.5.0_3.0_1726538702912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds_malikibrar_pipeline_en_5.5.0_3.0_1726538702912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds_malikibrar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds_malikibrar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds_malikibrar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/MalikIbrar/whisper-tiny-minds + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_spanish_herme_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_spanish_herme_pipeline_es.md new file mode 100644 index 00000000000000..532c81f562db79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_spanish_herme_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish whisper_tiny_spanish_herme_pipeline pipeline WhisperForCTC from herme +author: John Snow Labs +name: whisper_tiny_spanish_herme_pipeline +date: 2024-09-17 +tags: [es, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_spanish_herme_pipeline` is a Castilian, Spanish model originally trained by herme. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_herme_pipeline_es_5.5.0_3.0_1726550901360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_herme_pipeline_es_5.5.0_3.0_1726550901360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_spanish_herme_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_spanish_herme_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_spanish_herme_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|390.8 MB| + +## References + +https://huggingface.co/herme/whisper-tiny-es + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-workstation_whisper_base_finetune_teacher__babble_noise_mozilla_100_epochs_batch_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-workstation_whisper_base_finetune_teacher__babble_noise_mozilla_100_epochs_batch_4_pipeline_en.md new file mode 100644 index 00000000000000..b55cfb0e5a7d92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-workstation_whisper_base_finetune_teacher__babble_noise_mozilla_100_epochs_batch_4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English workstation_whisper_base_finetune_teacher__babble_noise_mozilla_100_epochs_batch_4_pipeline pipeline WhisperForCTC from rohitp1 +author: John Snow Labs +name: workstation_whisper_base_finetune_teacher__babble_noise_mozilla_100_epochs_batch_4_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`workstation_whisper_base_finetune_teacher__babble_noise_mozilla_100_epochs_batch_4_pipeline` is a English model originally trained by rohitp1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/workstation_whisper_base_finetune_teacher__babble_noise_mozilla_100_epochs_batch_4_pipeline_en_5.5.0_3.0_1726546145350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/workstation_whisper_base_finetune_teacher__babble_noise_mozilla_100_epochs_batch_4_pipeline_en_5.5.0_3.0_1726546145350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("workstation_whisper_base_finetune_teacher__babble_noise_mozilla_100_epochs_batch_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("workstation_whisper_base_finetune_teacher__babble_noise_mozilla_100_epochs_batch_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|workstation_whisper_base_finetune_teacher__babble_noise_mozilla_100_epochs_batch_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.7 MB| + +## References + +https://huggingface.co/rohitp1/workstation_whisper_base_finetune_teacher__babble_noise_mozilla_100_epochs_batch_4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_final_mixed_aug_swap_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_final_mixed_aug_swap_en.md new file mode 100644 index 00000000000000..344a556f065d4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_final_mixed_aug_swap_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_aug_swap XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_aug_swap +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_aug_swap` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_swap_en_5.5.0_3.0_1726535426174.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_swap_en_5.5.0_3.0_1726535426174.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_aug_swap","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_aug_swap", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_aug_swap| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|794.4 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-aug_swap \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_operator_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_operator_en.md new file mode 100644 index 00000000000000..5dea8a330d2cc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_operator_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_operator XlmRoBertaForSequenceClassification from DanLee6507 +author: John Snow Labs +name: xlm_roberta_base_operator +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_operator` is a English model originally trained by DanLee6507. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_operator_en_5.5.0_3.0_1726536828125.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_operator_en_5.5.0_3.0_1726536828125.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_operator","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_operator", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_operator| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|870.6 MB| + +## References + +https://huggingface.co/DanLee6507/xlm-roberta-base-operator \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_ukraine_waray_philippines_pov_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_ukraine_waray_philippines_pov_v1_pipeline_en.md new file mode 100644 index 00000000000000..ae1d76cbeffc37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_ukraine_waray_philippines_pov_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_ukraine_waray_philippines_pov_v1_pipeline pipeline XlmRoBertaForSequenceClassification from YaraKyrychenko +author: John Snow Labs +name: xlm_roberta_base_ukraine_waray_philippines_pov_v1_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ukraine_waray_philippines_pov_v1_pipeline` is a English model originally trained by YaraKyrychenko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ukraine_waray_philippines_pov_v1_pipeline_en_5.5.0_3.0_1726536068355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ukraine_waray_philippines_pov_v1_pipeline_en_5.5.0_3.0_1726536068355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_ukraine_waray_philippines_pov_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_ukraine_waray_philippines_pov_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ukraine_waray_philippines_pov_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|875.0 MB| + +## References + +https://huggingface.co/YaraKyrychenko/xlm-roberta-base-ukraine-war-pov-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_vtoc_100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_vtoc_100_pipeline_en.md new file mode 100644 index 00000000000000..54fc411a6966ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_vtoc_100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_vtoc_100_pipeline pipeline XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_vtoc_100_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vtoc_100_pipeline` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vtoc_100_pipeline_en_5.5.0_3.0_1726536081129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vtoc_100_pipeline_en_5.5.0_3.0_1726536081129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_vtoc_100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_vtoc_100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vtoc_100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|783.1 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-vtoc-100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_finetuned_emojis_cen_2_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_finetuned_emojis_cen_2_en.md new file mode 100644 index 00000000000000..3b0ab34b6040ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_finetuned_emojis_cen_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_cen_2 XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_cen_2 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_cen_2` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_cen_2_en_5.5.0_3.0_1726536670104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_cen_2_en_5.5.0_3.0_1726536670104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_cen_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_cen_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_cen_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-cen-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_v_base_tweet_sentiment_portuguese_trimmed_portuguese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_v_base_tweet_sentiment_portuguese_trimmed_portuguese_pipeline_en.md new file mode 100644 index 00000000000000..35c62155ae0e82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_v_base_tweet_sentiment_portuguese_trimmed_portuguese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_v_base_tweet_sentiment_portuguese_trimmed_portuguese_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_v_base_tweet_sentiment_portuguese_trimmed_portuguese_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_v_base_tweet_sentiment_portuguese_trimmed_portuguese_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_v_base_tweet_sentiment_portuguese_trimmed_portuguese_pipeline_en_5.5.0_3.0_1726535626907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_v_base_tweet_sentiment_portuguese_trimmed_portuguese_pipeline_en_5.5.0_3.0_1726535626907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_v_base_tweet_sentiment_portuguese_trimmed_portuguese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_v_base_tweet_sentiment_portuguese_trimmed_portuguese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_v_base_tweet_sentiment_portuguese_trimmed_portuguese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|647.0 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-tweet-sentiment-pt-trimmed-pt + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlmr_sinhalese_english_all_shuffled_1986_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlmr_sinhalese_english_all_shuffled_1986_test1000_en.md new file mode 100644 index 00000000000000..7b5ac6e8181ca9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlmr_sinhalese_english_all_shuffled_1986_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_sinhalese_english_all_shuffled_1986_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_sinhalese_english_all_shuffled_1986_test1000 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_sinhalese_english_all_shuffled_1986_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_all_shuffled_1986_test1000_en_5.5.0_3.0_1726535648490.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_all_shuffled_1986_test1000_en_5.5.0_3.0_1726535648490.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_sinhalese_english_all_shuffled_1986_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_sinhalese_english_all_shuffled_1986_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_sinhalese_english_all_shuffled_1986_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|814.4 MB| + +## References + +https://huggingface.co/patpizio/xlmr-si-en-all_shuffled-1986-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlmr_sinhalese_english_all_shuffled_1986_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlmr_sinhalese_english_all_shuffled_1986_test1000_pipeline_en.md new file mode 100644 index 00000000000000..c410676ab487f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlmr_sinhalese_english_all_shuffled_1986_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_sinhalese_english_all_shuffled_1986_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_sinhalese_english_all_shuffled_1986_test1000_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_sinhalese_english_all_shuffled_1986_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_all_shuffled_1986_test1000_pipeline_en_5.5.0_3.0_1726535772841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_all_shuffled_1986_test1000_pipeline_en_5.5.0_3.0_1726535772841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_sinhalese_english_all_shuffled_1986_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_sinhalese_english_all_shuffled_1986_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_sinhalese_english_all_shuffled_1986_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.4 MB| + +## References + +https://huggingface.co/patpizio/xlmr-si-en-all_shuffled-1986-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-your_model_name_en.md b/docs/_posts/ahmedlone127/2024-09-17-your_model_name_en.md new file mode 100644 index 00000000000000..39b26b62e6535a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-your_model_name_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English your_model_name BertForQuestionAnswering from utkuozuak +author: John Snow Labs +name: your_model_name +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`your_model_name` is a English model originally trained by utkuozuak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/your_model_name_en_5.5.0_3.0_1726544721783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/your_model_name_en_5.5.0_3.0_1726544721783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("your_model_name","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("your_model_name", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|your_model_name| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/utkuozuak/your_model_name \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-your_model_name_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-your_model_name_pipeline_en.md new file mode 100644 index 00000000000000..69a19f7b43dac5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-your_model_name_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English your_model_name_pipeline pipeline BertForQuestionAnswering from utkuozuak +author: John Snow Labs +name: your_model_name_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`your_model_name_pipeline` is a English model originally trained by utkuozuak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/your_model_name_pipeline_en_5.5.0_3.0_1726544741254.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/your_model_name_pipeline_en_5.5.0_3.0_1726544741254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("your_model_name_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("your_model_name_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|your_model_name_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/utkuozuak/your_model_name + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file