From a63c06da8d32a9ae65cc626d8e7ff5f8433fdce8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Fri, 5 Jul 2024 20:54:25 +0700 Subject: [PATCH 1/4] Add model 2024-07-05-phi2_7b_en --- .../ahmedlone127/2024-07-05-phi2_7b_en.md | 80 +++++++++++++++++++ 1 file changed, 80 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md diff --git a/docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md b/docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md new file mode 100644 index 00000000000000..fd840b2a96c779 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md @@ -0,0 +1,80 @@ +--- +layout: model +title: Phi2 text-to-text model 7b int8 +author: John Snow Labs +name: phi2_7b +date: 2024-07-05 +tags: [phi2, en, llm, open_source, openvino] +task: Text Generation +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: Phi2Transformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained phi2 model , adapted and imported into Spark NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phi2_7b_en_5.4.0_3.0_1720187078320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phi2_7b_en_5.4.0_3.0_1720187078320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +phi2 = Phi2Transformer \ + .pretrained() \ + .setMaxOutputLength(50) \ + .setDoSample(False) \ + .setInputCols(["document"]) \ + .setOutputCol("phi2_generation") + +pipeline = Pipeline().setStages([documentAssembler, phi2]) +data = spark.createDataFrame([["Who is the founder of Spark-NLP?"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val phi2 = Phi2Transformer .pretrained() .setMaxOutputLength(50) .setDoSample(False) .setInputCols(["document"]) .setOutputCol("phi2_generation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, phi2)) +val data = Seq("Who is the founder of Spark-NLP?").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phi2_7b| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|9.1 GB| \ No newline at end of file From ba10bc846957e61333ec1b6a3452f8aaaac53e00 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Fri, 12 Jul 2024 10:29:16 +0700 Subject: [PATCH 2/4] Add model 2024-07-12-bart_large_cnn_en --- .../2024-07-12-bart_large_cnn_en.md | 71 +++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md diff --git a/docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md b/docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md new file mode 100644 index 00000000000000..8c8329cca822b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: BART (large-sized model), fine-tuned on CNN Daily Mail +author: John Snow Labs +name: bart_large_cnn +date: 2024-07-12 +tags: [bart, bartsummarization, cnn, text_to_text, en, open_source, tensorflow] +task: Summarization +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: BartTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +BART model pre-trained on English language, and fine-tuned on CNN Daily Mail. It was introduced in the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Lewis et al. and first released in [this repository (https://github.com/pytorch/fairseq/tree/master/examples/bart). + +Disclaimer: The team releasing BART did not write a model card for this model so this model card has been written by the Hugging Face team. + +Model description +BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. + +BART is particularly effective when fine-tuned for text generation (e.g. summarization, translation) but also works well for comprehension tasks (e.g. text classification, question answering). This particular checkpoint has been fine-tuned on CNN Daily Mail, a large collection of text-summary pairs + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bart_large_cnn_en_5.4.0_3.0_1720754028322.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bart_large_cnn_en_5.4.0_3.0_1720754028322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +bart = BartTransformer.pretrained("bart_large_cnn") .setTask("summarize:") .setMaxOutputLength(200) .setInputCols(["documents"]) .setOutputCol("summaries") + +``` +```scala + +val bart = BartTransformer.pretrained("bart_large_cnn") + .setTask("summarize:") + .setMaxOutputLength(200) + .setInputCols("documents") + .setOutputCol("summaries") + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bart_large_cnn| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents]| +|Output Labels:|[generation]| +|Language:|en| +|Size:|974.9 MB| \ No newline at end of file From c7d88ff49ec563764717649b699eda4b3f57236a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Fri, 12 Jul 2024 10:41:51 +0700 Subject: [PATCH 3/4] Add model 2024-07-12-bart_large_cnn_en --- docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md b/docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md index 8c8329cca822b3..560ddafed55ed3 100644 --- a/docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md +++ b/docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md @@ -31,8 +31,8 @@ BART is particularly effective when fine-tuned for text generation (e.g. summari {:.btn-box} -[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bart_large_cnn_en_5.4.0_3.0_1720754028322.zip){:.button.button-orange.button-orange-trans.arr.button-icon} -[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bart_large_cnn_en_5.4.0_3.0_1720754028322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bart_large_cnn_en_5.4.0_3.0_1720754758442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bart_large_cnn_en_5.4.0_3.0_1720754758442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} ## How to use From a053e3d4bce678e357d199d7f50fda1930bce936 Mon Sep 17 00:00:00 2001 From: Maziyar Panahi Date: Fri, 12 Jul 2024 15:59:05 +0200 Subject: [PATCH 4/4] Update 2024-07-05-phi2_7b_en.md --- docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md b/docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md index fd840b2a96c779..0b640e7fa5e161 100644 --- a/docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md +++ b/docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md @@ -2,7 +2,7 @@ layout: model title: Phi2 text-to-text model 7b int8 author: John Snow Labs -name: phi2_7b +name: phi2 date: 2024-07-05 tags: [phi2, en, llm, open_source, openvino] task: Text Generation @@ -24,8 +24,8 @@ Pretrained phi2 model , adapted and imported into Spark NLP. {:.btn-box} -[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phi2_7b_en_5.4.0_3.0_1720187078320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} -[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phi2_7b_en_5.4.0_3.0_1720187078320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phi2_en_5.4.0_3.0_1720187078320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phi2_en_5.4.0_3.0_1720187078320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} ## How to use @@ -72,9 +72,9 @@ val pipelineDF = pipelineModel.transform(data) {:.table-model} |---|---| -|Model Name:|phi2_7b| +|Model Name:|phi2| |Compatibility:|Spark NLP 5.4.0+| |License:|Open Source| |Edition:|Official| |Language:|en| -|Size:|9.1 GB| \ No newline at end of file +|Size:|9.1 GB|