From 8fe1748d423a6e5862d670d04c6035e43aef0718 Mon Sep 17 00:00:00 2001 From: jsl-models <74001263+jsl-models@users.noreply.github.com> Date: Fri, 12 Jul 2024 20:59:29 +0700 Subject: [PATCH] 2024-07-05-phi2_7b_en (#14339) * Add model 2024-07-05-phi2_7b_en * Add model 2024-07-12-bart_large_cnn_en * Add model 2024-07-12-bart_large_cnn_en * Update 2024-07-05-phi2_7b_en.md --------- Co-authored-by: ahmedlone127 Co-authored-by: Maziyar Panahi --- .../ahmedlone127/2024-07-05-phi2_7b_en.md | 80 +++++++++++++++++++ .../2024-07-12-bart_large_cnn_en.md | 71 ++++++++++++++++ 2 files changed, 151 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md create mode 100644 docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md diff --git a/docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md b/docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md new file mode 100644 index 00000000000000..0b640e7fa5e161 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-07-05-phi2_7b_en.md @@ -0,0 +1,80 @@ +--- +layout: model +title: Phi2 text-to-text model 7b int8 +author: John Snow Labs +name: phi2 +date: 2024-07-05 +tags: [phi2, en, llm, open_source, openvino] +task: Text Generation +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: Phi2Transformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained phi2 model , adapted and imported into Spark NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phi2_en_5.4.0_3.0_1720187078320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phi2_en_5.4.0_3.0_1720187078320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +phi2 = Phi2Transformer \ + .pretrained() \ + .setMaxOutputLength(50) \ + .setDoSample(False) \ + .setInputCols(["document"]) \ + .setOutputCol("phi2_generation") + +pipeline = Pipeline().setStages([documentAssembler, phi2]) +data = spark.createDataFrame([["Who is the founder of Spark-NLP?"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val phi2 = Phi2Transformer .pretrained() .setMaxOutputLength(50) .setDoSample(False) .setInputCols(["document"]) .setOutputCol("phi2_generation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, phi2)) +val data = Seq("Who is the founder of Spark-NLP?").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phi2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|9.1 GB| diff --git a/docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md b/docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md new file mode 100644 index 00000000000000..560ddafed55ed3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-07-12-bart_large_cnn_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: BART (large-sized model), fine-tuned on CNN Daily Mail +author: John Snow Labs +name: bart_large_cnn +date: 2024-07-12 +tags: [bart, bartsummarization, cnn, text_to_text, en, open_source, tensorflow] +task: Summarization +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: BartTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +BART model pre-trained on English language, and fine-tuned on CNN Daily Mail. It was introduced in the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Lewis et al. and first released in [this repository (https://github.com/pytorch/fairseq/tree/master/examples/bart). + +Disclaimer: The team releasing BART did not write a model card for this model so this model card has been written by the Hugging Face team. + +Model description +BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. + +BART is particularly effective when fine-tuned for text generation (e.g. summarization, translation) but also works well for comprehension tasks (e.g. text classification, question answering). This particular checkpoint has been fine-tuned on CNN Daily Mail, a large collection of text-summary pairs + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bart_large_cnn_en_5.4.0_3.0_1720754758442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bart_large_cnn_en_5.4.0_3.0_1720754758442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +bart = BartTransformer.pretrained("bart_large_cnn") .setTask("summarize:") .setMaxOutputLength(200) .setInputCols(["documents"]) .setOutputCol("summaries") + +``` +```scala + +val bart = BartTransformer.pretrained("bart_large_cnn") + .setTask("summarize:") + .setMaxOutputLength(200) + .setInputCols("documents") + .setOutputCol("summaries") + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bart_large_cnn| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents]| +|Output Labels:|[generation]| +|Language:|en| +|Size:|974.9 MB| \ No newline at end of file